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PREFACE 


The  MPSA  international  conference  is  held  in  a  different  country  every  two  years.  It 
is  devoted  to  methods  of  determining  protein  structure  with  emphasis  on  chemistry  and 
sequence  analysis.  Until  the  ninth  conference,  MPSA  was  an  acronym  for  Methods  in  Protein 
Sequence  Analysis.  To  give  the  conference  more  flexibility  and  breadth,  the  Scientific 
Advisory  Committee  of  the  10th  MPSA  decided  to  change  the  name  to  Methods  in  Protein 
Structure  Analysis;  however,  the  emphasis  remains  on  “methods”  and  on  “chemistry.”  In 
fact,  this  is  the  only  major  conference  that  is  devoted  to  methods. 

The  MPSA  conference  is  truly  international,  a  fact  clearly  reflected  by  the  composi¬ 
tion  of  its  Scientific  Advisory  Committee.  The  Scientific  Advisory  Committee  oversees  the 
scientific  direction  of  the  MPSA  and  elects  the  chairman  of  the  conference.  Members  of  the 
committee  are  elected  by  active  members,  based  on  scientific  standing  and  activity.  The 
chairman,  subject  to  approval  of  the  Scientific  Advisory  Committee,  appoints  the  Organizing 
Committee.  It  is  this  latter  committee  that  puts  the  conference  together.  The  lectures  of  the 
MPSA  have  traditionally  been  published  in  a  special  proceedings  issue.  This  is  different 
from,  and  more  detailed  than,  the  special  MPSA  issue  of  the  Journal  of  Protein  Chemistry 
in  which  only  a  brief  description  of  the  talks  is  given  in  short  papers  and  abstracts.  In  the 
1 0th  MPSA,  about  half  the  talks  are  by  invited  speakers  and  the  remainder  were  selected 
from  submitted  short  papers  and  abstracts.  Inclusion  of  submitted  contributions  in  the  oral 
program  is  an  important  mechanism  for  bringing  new  discoveries  and  innovations  to  the 
forefront.  These  proceedings  are  divided  into  eight  topics:  (1)  preparation  of  proteins  and 
peptides  for  microsequence  analysis;  (2)  N-terminal  sequence  analysis;  (3)  C-terminal 
sequence  analysis;  (4)  mass  spectrometry;  (5)  new  strategies  for  protein  and  peptide 
characterization;  (6)  immunological  recognition,  phage  and  synthetic  libraries;  (7)  analysis 
of  protein  structures  of  special  interest;  (8)  database  analysis,  protein  folding,  and  three-di¬ 
mensional  structures  of  proteins.  We  believe  that  the  different  chapters  in  this  book  will 
provide  a  timely  resource  for  the  analysis  of  protein  structures,  which  constitutes  an 
indispensable  part  of  contemporary  biochemistry  and  molecular  biology.  Protein  structure 
analysis  continues  to  progress  in  line  with  other  developments  in  modem  biochemistry, 
molecular  biology,  and  biophysics,  and  is  essential  for  the  design  of  therapeutic  agents  useful 
for  the  control  of  human  diseases. 

Conferences  of  this  size  (493  participants)  require  considerable  funding.  We  would 
like  to  express  our  gratitude  to  our  sponsors.  Without  their  support  and  generosity  this 
meeting  would  not  have  been  possible.  We  especially  would  like  to  thank  Millipore,  Inc.  for 
supporting  the  Edman  Award.  Because  of  this  wonderful  support  and  the  excellent  registra¬ 
tion  we  were  able  to  offer  assistance  for  attending  the  conference  to  six  junior  scientists  and 
thirteen  students.  Together  with  the  International  Science  Foundation,  we  co-sponsored  six 
scientists  from  the  former  Soviet  Republics  of  Russia  and  Uzbekistan. 
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Preface 


The  editors  wish  to  thank  Marie  Pellum,  Kella  Kunz,  Priscilla  Igori,  and  Shana  Atassi 
for  their  invaluable  assistance  in  the  organization  of  the  conference  and  especially  Priscilla 
Igori  and  Shana  Atassi  for  their  help  in  the  organization  and  preparation  of  this  book. 

M.  Zouhair  Atassi 
Ettore  Appella 


THE  EDMAN  AWARDS  1994 


On  behalf  of  the  Pehr  Edman  Award  Selection  Committee  of  the  1 0th  MPSA  and 
Millipore  Corporation,  it  is  with  great  pleasure  that  I  have  been  asked  to  write  a  few  words 
on  the  1 994  awardees  and  on  the  prize  itself.  To  understand  the  contributions  that  Dr.  Reudi 
Aebersold  and  Dr.  Joel  Vanderkerckhove  have  made  to  this  field,  the  pioneering  work  of 
Pehr  Edman  should  be  acknowledged. 

The  field  which  has  immortalized  his  name  got  its  early  roots  with  Pehr  Edman ’s 
studies  in  1946  on  bovine  angiotensin  at  the  Karolinska  Institute  in  Stockholm.  He  observed 
that  neither  the  molecular  mass  nor  the  amino  acid  composition  were  giving  information  on 
the  biological  activity  of  the  small,  physiologically  important  molecule.  During  his  stay  at 
the  Rockefeller  Institute  at  Princeton  from  1 946-47,  he  looked  at  different  reagents  that  could 
react  with  the  amino  group  of  peptides.  This  led  him  to  introduce  the  use  of  phenylisothio- 
cyanate  for  carbamylation  of  reactive  amines  on  model  peptides.  With  this  tool  the  door  was 
opened  for  the  systematic  elucidation  of  the  primary  structure  of  polypeptides.  It  is  evident 
from  the  proceedings  of  the  10th  MPSA  that  this  chemistry  is  still  alive  as  we  move  the 
analytical  capabilities  into  new  directions  all  aimed  at  accelerating  sequencing.  However, 
the  basic  chemical  approach  has  not  changed  since  Pehr  Edman  recognized  the  reactivity  of 
the  amino  group  in  peptides  and  proteins.  He  also  recognized  that  the  chemical  degradation 
of  the  amino  terminal  is  a  three-stage  process  in  which  it  is  critical  that  the  thiazolinone 
derivative  be  removed  from  the  parent  peptide  and  converted  to  the  stable  PTH  derivative 
in  a  separate  vessel.  This  was  the  only  way  to  achieve  high  repetitive  yields,  a  prerequisite 
for  unraveling  long  peptide  sequences. 

Since  the  early  work  of  Pehr  Edman  there  have  been  many  contributions  to  its 
emergence  as  the  primary  protein  structural  characterization  tool  that  it  is  today.  The  selection 
of  recipients  of  the  1994  Edman  prize  was  a  difficult  one.  Two  individuals  are  recognized 
this  year;  Dr.  Joel  Vanderkerckhove  and  Dr.  Reudi  Aebersold.  Their  contributions  have 
helped  to  make  it  possible  to  sequence  samples  from  electrophoretic  separations  after 
electroblotting  to  solid  phases.  In  their  earlier  work  they  both  recognized  the  importance  of 
the  solid  phase  in  presenting  the  sample  to  the  instrumentation  and  chemistry  of  the  modem 
sequencer.  At  Millipore  we  have  seen  many  applications  develop  in  this  field  with  the 
introduction  of  the  polyvinyl idene  fluoride  (PVDF)  membrane  in  1986  and  its  recognition 
as  a  sequencing  support  in  1987.  At  the  10th  MPSA,  clearly,  we  were  still  learning  more 
about  the  importance  of  the  solid  phase  in  this  approach.  Drs.  Vanderkerckhove  and 
Aebersold  are  both  actively  moving  this  field  forward  and  were  selected  by  the  10th  MPSA 
Award  Selection  Committee  to  receive  the  Edman  Award,  sponsored  by  Millipore. 

To  learn  more  about  the  awardees  who  are  being  honored  in  memory  of  Pehr  Edman, 
the  accomplishments  and  short  biographies  of  Dr.  Joel  Vanderkerckhove  and  Dr.  Reudi 
Aebersold  will  be  outlined  here  briefly. 
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The  Edman  Awards  1994 


Dr.  Pehr  Edman 


JOEL  VANDERKERCKHOVE 

Dr.  Joel  Vankderkerckhove  was  bom  in  Belgium  and  obtained  a  degree  in  Chemistry 
at  the  University  of  Ghent  in  Belgium.  In  1967,  he  started  his  Ph.D.  work  on  the  sequence 
determination  of  the  coat  protein  of  bacteriophage  MS2,  work  which  brought  him  directly 
to  the  heart  of  protein  chemistry.  This  work  brought  him  in  contact  with  Dr.  Klaus  Weber  at 
the  Max  Plank  Institute  of  Biophysical  Chemistry  in  Gottingen.  There,  he  started  a  second 
period  in  his  research  career,  studying  the  molecular  basis  of  the  different  isoforms  of  actin 
and  analyzing  their  expression  in  different  cells  and  tissues. 

In  1981,  Vanderkerckhove  returned  to  his  old  “mews”  where  he  continued  the  work 
on  the  isoactin  expression.  That  work  became  a  classic  in  cell  biology.  In  order  to  solve  some 
of  the  problems  for  this  work,  Vanderkerckhove  introduced  the  protein  electroblotting 
techniques;  the  original  method  employing  polybased-coated  glass-fiber  membranes  as 
immobilizing  support.  Similar  techniques  were  at  that  time  developed  in  the  laboratory  of 
Professor  Lee  Hood  at  California  Institute  of  Technology  (Cal  Tech),  together  with  Drs. 
Steven  Kent  and  Reudi  Aebersold.  The  success  of  this  method  is  now  well  known. 

Close  contact  with  the  successful  plant-engineering  group  of  Professor  M.  Van 
Montagu  at  the  University  of  Ghent,  enticed  Joel  Vanderkerckhove  to  use  transgenic  plants 
for  the  production  of  bioactive  peptides  as  part  of  hybrid  seed  storage  proteins.  This  was  the 
first  step  towards  molecular  farming  offering  interesting  prospectives  for  increasing  the 
nutritional  value  of  seeds. 

The  protein-chemical  micropreparation  techniques  which  were  meanwhile  devel¬ 
oped  were  applied  in  the  development  of  a  2D  gel  database  in  collaboration  with  the  group 
of  Professor  J.  Celis.  This  is  one  of  the  largest  databases  of  its  kind. 

In  1990,  Vanderkerckhove  became  head  of  the  Department  of  Biochemistry  at  the 
Medical  Faculty  of  the  University  of  Ghent  (an  unusual  appointment  for  a  nonphysician). 
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Since  then,  his  research  has  concentrated  on  the  molecular  mechanisms  underlying  the 
organization  of  the  microfilament  system  in  the  cell.  In  particular,  he  has  addressed  the 
problem  of  the  multiple  interactions  between  actin  and  actin-binding  proteins,  a  complex 
protein-protein  docking  problem. 


RUEDI  AEBERSOLD 

Dr.  Aebersold  is  a  native  of  Switzerland,  where  he  grew  up  and  received  his 
education.  He  graduated  in  1983  with  a  doctoral  degree  in  cell  biology  from  the  Biocenter, 
University  of  Basel.  His  thesis  work,  which  was  carried  out  in  the  laboratory  of  Dr.  J.Y. 
Chang  at  Ciba-Geigy  in  Basel,  involved  the  sequence  analysis  of  monoclonal  antibodies 
directed  against  streptococcal  group  carbohydrates. 

After  this,  he  came  to  the  U.S.  to  do  a  postdoctoral  fellowship  with  Dr.  Lee  Hood  at 
Cal  Tech.  He  stayed  at  Cal  Tech  from  1984-88  where  he  and  his  co-workers  worked  out 
several  protein  analytical  techniques.  The  most  notable  ones  might  be  the  isolation  of 
proteins  for  N-terminal  sequencing  by  electroblotting  from  polyacrylamide  gels,  a  procedure 
for  internal  sequence  analysis  of  small  amounts  of  gel  separated  and  electroblotted  proteins 
by  in  situ  digestion  on  nitrocellulose  membranes  and  the  application  of  these  techniques  to 
proteins  separated  by  2D  gel  electrophoresis.  With  this  work,  they  attempted  to  make  the 
technique  of  protein  sequencing  compatible  with  techniques  most  commonly  used  in  the 
laboratory  of  the  biochemist. 

In  1988,  Aebersold  moved  to  the  Biomedical  Research  Center  at  the  University  of 
British  Columbia  in  Vancouver  to  take  an  Assistant  Professorship  at  the  Department  of 
Biochemistry.  In  Vancouver,  he  started  working  on  the  delineation  of  signal  transduction 
pathways  inside  cells  using  a  protein  analytical  approach.  Initially,  he  characterized  several 
signaling  proteins  by  sequencing.  He  developed  protocols  for  the  determination  of  sites  of 
Ser,  Thr  as  well  as  Tyr  phosphorylation  by  solid-phase  sequencing.  These  techniques  were 
used  to  determine  the  sites  of  protein  phosphorylation. 

More  recently,  he  has  worked  on  developing  combined  chemical  and  mass  spectro- 
metric  protocols  for  the  determination  of  the  sites  of  protein  phosphorylation  at  high 
sensitivities.  He  has  demonstrated  that  the  interaction  between  the  kinase  Zap  70  and 
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phosphorylated  TCR  zeta  chain  is  mediated  by  tyrosine  phosphorylation,  is  essential  for  the 
T-cell  receptor  signaling,  and  can  be  interrupted  with  a  synthetic  analog  modeled  after  the 
phosphopeptide  sequence. 

In  1993,  he  moved  to  the  Department  of  Molecular  Biotechnology  at  the  University 
of  Washington  in  Seattle  to  take  a  position  as  Associate  Professor  and  Associate  Director  of 
the  NSF  Science  and  Technology  Center  on  Molecular  Biotechnology.  His  work  continued 
to  focus  on  the  development  and  application  of  protein  analytical  technology. 

The  general  aim  of  his  work  is,  therefore,  the  development  of  analytical  technology 
thats  can  be  directly  interfaced  with  experiments  in  the  typical  biochemistry  laboratory  to 
answer  biological  problems. 
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Senior  Consulting  Scientist 
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CHARACTERIZATION  OF  PROTEINS 
SEPARATED  BY  GEL  ELECTROPHORESIS 
AT  THE  PRIMARY  STRUCTURE  LEVEL 


Ruedi  Aebersold,^*  Lawrence  N.  Amankwa,^  Heinz  Nika,*"^ 

David  T.  Chow,^  Edward  J.  Bures, ^  Hamish  D.  Morrison,^  Daniel  Hess,'l^ 
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'  Department  of  Molecular  Biotechnology 
University  of  Washington 
Seattle,  Washington. 

^  The  Biomedical  Research  Centre 
University  of  British  Columbia 
Vancouver,  B.C,,  Canada 


INTRODUCTION 

The  investigation  of  cell  differentiation,  development,  and  signal  transduction  path¬ 
ways  are  examples  of  current  research  projects  which  have  in  common  the  focus  on  complex, 
highly  regulated  systems  consisting  of  numerous  interacting  elements.  A  complete  under¬ 
standing  of  such  processes  can  only  be  achieved  if  the  problem  is  approached  globally, 
considering  the  temporal  and  spatial  interactions  of  all  the  elements  involved.  This  task  is 
supported  by  large  amounts  of  data  stored  and  annotated  in  databases  such  as  nucleic  acid 
and  amino  acid  sequence  databases  and  two  dimensional  (2D)’^^  protein  databases. 
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Washington  FJ-20,  Seattle,  WA  98195  USA.  Phone  206  685  4235  Fax  206  685  6932. 
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^  Current  address:  Friedrich-Miescher  Institute,  Basel,  Switzerland. 
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ABBREVIATIONS:  2D:  two  dimensional;  2DE:  2D  gel  electrophoresis;  lEF:  isoelectric  focusing;  SDS- 
PAGE:  sodium  dodecyl  sulfate  -  polyacrylamide  gel  electrophoresis;  RP-HPLC:  reverse-phase  high  per¬ 
formance  liquid  chromatography;  ESI-MS:  electrospray  ionization  mass  spectrometer/try;  PITC  311: 
4-(3-pyridinylmethylaminocarboxypropyl)phenyl  isothiocyanate;  311  PTH:  4-(3-pyridinylmethylamino- 
carboxypropyl)phenyl  thiohydantoin;  CE:  capillary  electrophoresis;  PTM:  post  translational  modification; 
ER:  enzyme  reactor. 
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The  number  of  sequences  entered  in  sequence  databases  is  growing  exponentially  at 
least  in  part  due  to  coordinated  large  scale  sequencing  programs  (e.g.  Dujon  et  al.,  1994). 
Such  genome  sequencing  efforts  will  result  in  the  determination  of  the  complete  genome 
sequence  for  a  few  species  within  the  current  decade  (Collins  and  Galas,  1993).  In  addition, 
systematic  cDNA  sequencing  projects  generate  increasingly  complete  databases  of  the  genes 
expressed  in  specific  tissues.  Although  access  to  the  most  advanced  cDNA  databases  is 
currently  limited,  it  is  expected  and  hoped  that  this  resource  will  eventually  become  generally 
accessible. 

While  sequence  databases  are  useful  for  answering  specific  questions,  the  linear 
structure  of  the  stored  information  lacks  important  dimensions  which  are  essential  for 
biologists.  Spatial  and  temporal  expression  profiles  and  expression  levels,  regulatory  fea¬ 
tures  including  post-translational  modifications  and  polypeptide  processing,  protein  traffick¬ 
ing  and  turnover,  information  on  interactions  with  other  elements  and  integration  of 
numerous  components  into  complex  pathways  are  examples  of  the  types  of  information 
which  are  not  directly  coded  for  in  the  DNA  sequence  and  are  therefore  not  extractable  from 
sequence  databases  alone. 

2D  protein  databases,  the  display  and  annotation  in  a  2D  pattern  of  the  expressed  and 
fully  processed  protein  components  of  a  cell  or  tissue,  represents  an  alternative  format  to 
globally  store  and  display  information.  While  the  term  2D  in  this  context  usually  refers  to 
the  two  dimensions  [isoelectric  focusing  (lEF)  and  SDS  polyacrylamide  gel  electrophoresis 
(SDS-PAGE)]  used  in  a  gel  electrophoresis  experiment  to  separate  the  hundreds  or  thousands 
of  proteins  in  an  extract  into  a  2D  pattern,  the  information  content  in  a  2D  protein  database 
is  in  fact  multidimensional.  Data  dimensions  which  can  be  obtained  by  simple  experiments 
and/or  subtractive  pattern  analysis  and  can  be  easily  integrated  into  2D  protein  databases 
include  temporal  and  spatial  expression  levels,  information  on  regulatory  features  mediated 
by  covalent  protein  modifications,  protein  trafficking  and  turnover  and  information  on  the 
interaction  of  polypeptides  with  other  components  to  form  functional  protein  complexes,  2D 
gel  electrophoresis  by  itself  does  however  not  provide  any  structural  information  on  the 
separated  species. 

The  information  contained  in  sequence  databases  and  in  2D  protein  databases  are 
therefore  complementary  but  not  easily  linked.  Here  we  describe  our  approach  towards  a 
rapid,  sensitive  and  conclusive  analysis  of  the  complete  covalent  structure  of  gel  separated 
proteins  and  illustrate  with  selected  results  the  current  status  of  these  projects.  We  are 
pursuing  two  main  objectives:  i)  characterization  of  gel  separated  proteins  by  their  amino 
acid  sequence  with  the  aim  of  correlating  a  protein  spot  in  a  2D  protein  database  with  the 
corresponding  entry  in  DNA  sequence  database  and  ii)  characterization  of  protein  modifi¬ 
cations  with  the  aim  of  understanding  regulatory  features  and  protein  processing  pathways. 


IDENTIFICATION  OF  PROTEINS  SEPARATED  BY  GEL 
ELECTROPHORESIS 

It  is  frequently  required  in  biological  research  projects  that  protein  species  repre¬ 
sented  as  protein  bands  or  spots  separated  by  gel  electrophoresis  be  further  characterized. 
Such  proteins  may  be  detected  by  comparative  protein  pattern  analysis,  western  blotting,  or 
as  dominant  species  in  samples  prepared  by  protein  purification.  Characterization  of  such 
proteins  by  their  amino  acid  sequence  not  only  represents  the  most  conclusive  criterion  for 
protein  identification  but  also  provides  a  unique  basis  for  further  experimentation  such  as 
cloning  of  the  corresponding  gene,  altering  specific  activities  by  site-directed  mutagenesis 
and  modulating  temporal  and  spatial  protein  expression  patterns.  In  addition,  limited 
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Figure  1.  The  protein  chemistry  workstation. 


sequence  information  is  suitable  to  unambiguously  establish  structural  relationships  between 
proteins  of  comparable  electrophoretic  mobilities  and  to  characterize  protein  isoforms  with 
different  electrophoretic  mobilities  generated  by  differential  protein  processing  or  modifi¬ 
cation. 

To  rapidly,  sensitively  and  reliably  characterize  proteins  separated  by  gel  electropho¬ 
resis  at  the  level  of  the  amino  acid  sequence  we  have  assembled  the  protein  chemistry 
workstation  shown  in  Fig,  1 .  The  system  consists  of  a  gel  electrophoresis  unit,  a  reverse- 
phase  high  performance  liquid  chromatography  (RP-HPLC)  system,  an  electrospray  ioniza¬ 
tion  mass  spectrometer  (ESI-MS),  a  fraction  collector,  a  protein  sequencer  and  a  datasystem. 

The  system  was  operated  in  the  following  way.  Proteins  separated  by  gel  electropho¬ 
resis,  SDS-PAGE,  lEF  or  2D  gel  electrophoresis  (2DE)  were  electrophoretic  ally  transferred 
from  the  gel  either  onto  nitrocellulose  or  onto  a  membrane  with  a  cationic  surface  (Immo- 
bilon  CD)  (Millipore  Corp.)  detected  by  staining  and  enzymatically  cleaved  as  described 
(Aebersold  et  al..  1987);  Patterson  et  aL.  1992  ).  Recovered  peptides  were  separated  by 
RP-HPLC  and  analyzed  by  on-line  ESI-MS.  Between  the  outlet  of  the  chromatography 
column  and  the  MS  ion  source  we  inserted  a  flow  splitting  device  which  split  approximately 
1 0%  of  the  column  effluent  into  the  mass  spectrometer  and  the  remaining  90%  of  the  sample 
was  collected  for  further  analysis  such  as  peptide  sequencing.  (Hess  et  al.,  1993).  Since  the 
ESI-MS  is  essentially  a  concentration  dependent  detector,  splitting  of  the  column  effluent 
did  not  significantly  reduce  the  sensitivity  of  peptide  detection.  The  LC-MS  peptide  mapping 
experiment  therefore  yields  the  masses  of  peptides  derived  from  the  protein  under  investi¬ 
gation  with  minimal  sample  loss.  In  addition  to  indicating  the  degree  of  purity  and  homoge¬ 
neity  of  the  collected  peptide  fractions,  as  illustrated  in  Fig.  2,  peptide  masses  can  be  used 
to  identify  a  protein  in  a  sequence  database  using  any  one  of  a  number  of  peptide  mass  search 
algorithms  which  were  developed  independently  in  the  last  two  years  (Henzel  et  al.  1993; 
James  et  al.  1993;  Mann  et  al.  1993;  Pappin  et  al.  1994;  Yates  et  al.  1993). 
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Figure  2.  Protein  identification  by  peptide  mass  search. 


Illustration  of  the  peptide  mass  database  search  procedure.  Required  entries  for  a 
search  are  the  experimentally  determined  peptide  masses,  the  estimated  mass  of  the  intact 
polypeptide  (as  determined  by  gel  electrophoretic  mobility)  and  the  type  of  enzyme  used  for 
protein  cleavage.  Search  results  specify  protein  name,  originating  species,  sequence  database 
access  code  and  more.  The  displayed  example  used  the  MOWSE  search  algorithm  (Pappin 
et  al.  1993)  to  search  the  OWL  sequence  database  (Akrigg  et  al.  1988;  Bleasby  et  al.  1990). 

In  cases  in  which  the  peptide  mass  database  search  could  not  or  could  not  conclusively 
identify  the  protein,  collected,  homogeneous  peptide  fractions  were  subjected  to  automated 
peptide  sequencing.  To  enhance  the  sensitivity  of  peptide  sequencing  we  have  developed  a 
new  degradation  chemistry  which  uses  ESI-MS  for  the  detection  of  the  degradation  products. 
The  chemistry  is  based  on  the  reagent  4-(3-pyridinylmethylaminocarboxypropyl)phenyl 
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isothiocyanate  (PITC  311)  (Bures  et  al.  1 994,  Hess  et  al.  1 994)  and  is  further  detailed  in  an 
article  by  Bures  et  al.  in  this  issue. 

The  chemistry  is  compatible  with  commonly  used  absorptive  protein  and  peptide 
sequencing  protocols.  ESI-MS  detection  of  the  generated  4-(3  pyridinylmethylaminocar- 
boxypropyl)phenyl  thiohydantoins  (311  PTH’s)  affords  detection  limits  at  the  low  femto- 
mole  level  and  provides  significant  data  enhancement  by  selected  ion  monitoring.  Fig.  3 
shows  results  from  a  high  sensitivity  sequencing  experiment  using  PITC  3 1 1  and  illustrates 
the  value  of  integrating  different  types  of  data  obtained  by  the  protein  chemistry  workstation. 
3.2  pmole  of  bovine  carbonic  anhydrase  (calibrated  by  quantitative  amino  acid  composition 
analysis)  was  cleaved  by  trypsin,  the  resulting  peptide  fragments  were  separated  by  RP- 
HPLC  and  manually  collected.  10%  of  the  eluting  peptide  sample  was  split  into  the  on-line 
ESI-MS  system,  A  peptide  of  973.5  Da  was  subjected  to  automated  sequencing  using  the 
PITC  3 1 1  chemistry,  the  3 1 1  PTH’s  were  analyzed  by  ESI-MS  and  the  data  are  represented 
in  histogram  format  in  Fig.  3.  The  specific  residue  determined  in  each  sequencing  cycle 
(marked  with  *  in  Fig.  3)  was  easily  determined.  Furthermore,  the  sum  of  the  molecular 
weights  of  the  determined  amino  acids  and  the  experimentally  determined  molecular  weight 
of  the  intact  peptide  helped  confirm  the  determined  amino  acid  sequence. 


SUMMARY:  PROTEIN  IDENTIFICATION 

We  have  developed  a  protein  chemistry  workstation  for  the  rapid,  sensitive  and 
conclusive  identification  of  proteins  separated  by  gel  electrophoresis.  The  system  operates 
on  a  two-pass  basis.  In  the  first  pass  proteins  are  enzymatically  fragmented  and  peptide 
molecular  weights  are  determined  by  LC-ESI-MS  or  CE-ESI-MS.  These  peptide  masses  are 
used  to  search  sequence  databases  for  corresponding  protein  sequences.  The  second  pass, 
required  for  protein  identification  in  cases  in  which  the  peptide  mass  database  search  is 
inconclusive,  consists  of  automated  sequencing  of  the  collected  peptides  using  PITC  311 
and  detection  of  3 1 1  PTH’s  by  ESI-MS.  Protein  identification  by  the  first  pass  is  fast,  simple, 
does  not  require  any  peptide  sequencing  and  is  growing  in  importance  with  increasing  size 
of  sequence  databases.  Protein  identification  by  the  second  pass,  PITC  311  sequencing  is 
currently  slightly  more  sensitive  than  peptide  sequencing  using  PITC  and  yields  less 
ambiguous  results  due  to  accurate  mass  analysis  of  the  311  PTH’s.  We  anticipate  that  future 
developments  in  the  PITC  311  degradation  chemistry  will  make  chemical  peptide  sequencing 
more  sensitive  by  at  least  one  order  of  magnitude,  faster  and  more  robust.  Finally,  the  data 
obtained  in  the  first  pass  and  the  second  pass  are  synergistic.  Integration  of  these  data  is 
useful  for  the  selection  of  peptides  for  sequencing,  for  confirmation  of  the  obtained  peptide 
sequences  and  for  minimizing  the  chance  of  sequencing  in  homogeneous  peptide  fractions 
or  peptides  derived  from  autocatalysis  of  proteolytic  enzymes. 


DETERMINATION  OF  THE  MOLECULAR  BASIS  OF 
DIFFERENTIAL  ELECTROPHORETIC  MOBILITIES  OF  PROTEINS 

There  are  numerous  indications  that  the  products  of  a  single  gene  are  translated  and 
processed  into  different  molecular  species  which  frequently  can  be  resolved  by  high-reso¬ 
lution  gel  electrophoresis.  Such  sets  of  closely  related  polypeptides  are  typically  suspected 
if  proteins  with  comparable  mobilities  by  SDS-PAGE  can  be  resolved  by  lEF  (charge  trains 
in  2DE).  While  the  close  structural  relatedness  can  quite  easily  be  verified  by  2D  western 
blotting  experiments,  the  molecular  basis  for  the  differential  mobility  is  more  difficult  to 
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elucidate.  Since  most  sequences  are  determined  by  DNA  sequencing  which  does  not  account 
for  post  translational  processing  and  modifications,  there  is  a  definitive  need  for  advanced 
technology  to  investigate  the  complete  covalent  structure  of  fully  processed  proteins. 
Localization  of  a  known  modification  within  a  polypeptide  sequence  and  “de-novo”  deter¬ 
mination  and  localization  of  modified  residues,  respectively,  represent  two  predominant 
technological  challenges. 


LOCALIZATION  OF  MODIFIED  RESIDUES  WITHIN  A 
POLYPEPTIDE  SEQUENCE 

Among  the  many  types  of  protein  modifications  described  to  date  (Wold  1981), 
(reversible)  protein  phosphorylation,  mainly  on  serine,  threonine  and  tyrosine  residues  is 
intensely  studied  for  its  essential  regulatory  role  in  many  physiological  processes.  To  localize 
the  sites  of  protein  phosphorylation  we  developed  a  post  translational  modification  (PTM) 
analyzer  as  shown  in  Fig.  4. 

The  system  consists  of  a  micro  enzyme  reactor  (ER),  a  separation  instrument  [HPLC 
or  capillary  electrophoresis]  and  an  ESI-MS  system.  All  three  components  are  connected 
on-line.  A  data  system  controls  the  operation  of  the  analyzer  and  integrates  the  generated 
data. 

The  use  of  a  phosphatase  ER  for  the  determination  of  sites  of  protein  tyrosine 
phosphorylation  illustrates  the  operation  of  the  PTM  analyzer.  Differentially  phosphorylated 
polypeptides  were  prepared  and  enzymatically  cleaved  as  described  above.  An  aliquot  of  the 
recovered  peptide  mixture  was  subjected  to  enzymatic  de-phosphorylation  in  the  phos¬ 
phatase  ER,  and  the  reaction  products  were  separated  and  analyzed  by  CE-E SI-MS  or  by 


Protein  Separation 


Figure  4.  The  post-translational  modification  analyzer 
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Figure  5.  Determination  of  the  site  of  protein  tyrosine  phosphorylation  using  the  PTM  analyzer.  TCR/CD3  ^ 
chain,  phosphorylated  in  vitro  with  the  tyrosine  kinase  p56'"*  was  cleaved  with  trypsin.  Resulting  peptides 
were  analyzed  in  a  PTM  analyzer  as  described  in  Fig.  4,  consisting  of  the  following  components:  ER:  human 
tyrosine  phosphatase  P  immobilized  on  biotin-avidin- methacrylate;  RP-HPLC;  C18  capillary  column;  ESI- 
MS:  Model  API  III  mass  spectrometer  (PE/SCIEX).  Total  ion  current  (300-2000  Da)  of  peptides  after  ER 
exposure  are  shown.  Inserts  show  the  mass  spectra  of  the  peaks  indicated  by  the  arrows.  The  characteristic 
differences  in  elution  time  and  measured  mass  identifies  the  two  analyzed  species  as  the  phospho-  and 
dephospho  form,  respectively,  of  the  same  peptide. 


LC-ESI-MS,  depending  on  the  configuration  of  the  system.  A  second  aliquot  of  the  same 
sample  was  analyzed  in  a  similar  way,  except  that  it  was  not  subjected  to  enzymatic 
dephosphorylation.  The  MS  data  from  both  samples  were  compared  in  the  datasystem  and 
phosphopeptides  were  identified  by  the  characteristic  change  in  mobility  in  the  separation 
system  as  well  as  by  the  characteristic  change  in  peptide  mass  induced  by  enzymatic 
dephosphorylation  (reduction  in  peptide  mass  by  80  Da)  per  phosphate  group  removed.  The 
data  shown  in  Fig.  5  further  illustrate  the  procedure.  The  cytoplasmic  part  of  the  T  cell 
receptor  chain  was  phosphorylated  in  vitro  with  the  tyrosine  kinase  as  described 

(Watts  et  al.  1992),  the  phosphoprotein  was  cleaved  with  trypsin  and  the  resulting  peptide 
mixture  was  analyzed  as  described  above. 

The  total  ion  current  shown  in  Fig.  5  represents  all  the  ions  detected  within  the  mass 
range  300-2000Da  from  the  sample  that  was  dephosphorylated  in  a  ER  consisting  of 
immobilized  human  tyrosine  phosphatase  p  and  separated  by  capillary  RP-HPLC  (Amankwa 
et  al.,  1995).  The  ER  was  constructed  by  immobilizing  a  metabolically  biotinylated  human 
tyrosine  phosphatase  p  fusion  protein  on  avidin-modified  sepharose  packed  in  a  capillary 
column.  Mass  spectral  analysis  of  two  peaks  eluting  around  28  min.  indicated  that  the  later 
eluting  peptide  [(M+2H)2+=8 16.5]  was  the  phosphatase  product  (the  dephospho  form)  of  the 
slightly  earlier  eluting  phosphopeptide  [(M+2H)2-'=856.5].  Comparative,  systematic  analy¬ 
sis  of  all  the  peptides  present  in  the  samples  with  or  without  phosphatase  treatment  allowed 
us  to  localize  5  additional  phosphorylation  sites  in  the  C,  chain  sample.  ER’s  for  CE-MS  were 
constructed  in  a  similar  way  by  immobilizing  the  same  phosphatase  fusion  protein  onto  the 
inner  surface  of  fused  silica  capillaries  coated  with  avidin  as  described  (Amankwa  and  Kuhr, 
1992).  We  used  this  type  of  ER  to  identify  a  site  of  tyrosine  phosphorylation  on  the  human 
platelet  derived  growth  factor  receptor  (Amankwa  et  al.,  1995). 
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The  system  has  the  following  advantages  and  limitations,  i)  Two  independent  criteria, 
a  shift  in  retention  time  and  the  characteristic  mass  reduction  by  80  Da  conclusively  identify 
phosphopeptides  in  a  peptide  mixture,  ii)  If  the  sequence  of  the  polypeptide  and  the  specificity 
of  the  proteolytic  enzyme  used  are  known,  the  mass  of  the  un-phosphorylated  peptide  is  usually 
sufficient  to  unambiguously  identify  the  phosphopeptide  within  the  polypeptide  sequence,  iii) 
The  system  operates  in  an  automated  manner  and  at  a  low  picomole  sensitivity  level  and  does 
not  require  radiolabeling  of  the  sample,  iv)  The  substrate  specificity  of  the  immobilized 
phosphatase  distinguishes  between  serine,  threonine  and  tyrosine  phosphorylated  peptides,  v) 
In  peptides  which  contain  more  than  one  site  which  could  be  phosphorylated,  the  exact  sites  of 
phosphorylation  need  to  be  determined  by  phosphopeptide  sequencing  ( Aebersold  et  al.,  1 99 1 ; 
Meyer  et  al,,  1991;  Wettenhall  et  al.,  1991)  vi)  The  approach  is  general  and  can  be  extended  to 
the  analysis  of  essentially  any  protein  modification  for  which  a  distinguishing  enzyme  reaction 
is  known,  and  vii)  The  approach  is  easily  adaptable  to  other  types  of  analytical  instrumentation 
such  as  matrix  assisted  laser-desorption  time-of-flight  MS. 


DE-NO  VO  CHARACTERIZATION  AND  LOCALIZATION  OF 
MODIFIED  RESIDUES 

De-novo  characterization  of  modified  amino  acid  residues  is  an  important  task  in 
analytical  protein  biochemistry.  Basic  research  into  the  structure  and  function  of  proteins 
continues  to  uncover  novel  types  of  protein  modifications,  the  structures  of  which  need  to  be 
analyzed.  In  the  biotechnology  industry  the  covalent  structure  of  recombinant,  overexpressed 
proteins  needs  to  be  documented.  Frequently,  such  proteins  carry  modified  residues.  Metabolic 
products  of  pharmaceuticals  potentially  modify  selected  proteins  and  interfere  with  their  function. 
Characterization  of  modified  residues  is  therefore  an  important  aspect  of  pharmacology  and 
toxicology.  Current  chemical  protein  sequencing  technology  is  of  very  limited  value  for  the 
structural  characterization  of  modified  residues.  We  have  therefore  evaluated  the  potential  of 
automated  peptide  sequencing  using  PITC  311  and  ESI-MS  detection  of  the  resulting  311  PTH’s. 

The  procedure  is  illustrated  by  the  results  shown  in  Fig.  6.  A  synthetic  pol3q)eptide 
containing  the  regulatory  tyrosine  residue  in  the  protein  tyrosine  kinase  pbO^^""  was  cleaved 
with  trypsin  and  the  phosphopeptide  was  isolated  by  RP-HPLC  as  described  above.  To  allow 
for  maximum  flexibility  in  the  extraction  conditions  during  automated  sequencing,  the 
phosphopeptide  was  covalently  attached  to  an  arylamine-modified  polyvinylidene  fluoride 
membrane  (Millipore  Corp.)  (Coull  et  al.  1991)  applied  to  the  protein  sequencer  cartridge 
and  sequenced  using  PITC  3 1 1  following  degradation  protocols  optimized  for  this  chemistry 
(Nika  et  al. ,  in  prep.).  The  samples  extracted  from  the  sequencer  cartridge  were  scanned  by 
on-line  ESI-MS  for  the  presence  of  an  ion  corresponding  to  311  PTH  phosphotyrosine 
[(M+H)‘^=555].  Fig.  6  illustrates  the  presence  of  the  expected  mass  in  cycles  7  and  8,  whereas 
the  signal  is  absent  in  the  cycles  preceding  cycle  7  and  in  later  cycles.  A  phosphotyrosine 
residue  was  therefore  positively  identified  in  cycle  7  of  the  peptide. 

We  have  used  to  same  approach  to  structurally  characterize  additional  types  of 
modified  residues  including  catalytic  nucleophiles  in  the  active  site  of  glycosidases  which 
were  covalently  modified  with  mechanism-based  enzyme  inhibitors  (S.  Lawson,  D.  Tull,  S. 
Withers,  University  of  British  Columbia,  unpublished).  In  these  cases  the  samples  extracted 
from  the  sequencer  were  analyzed  by  ESI-MS  scanning  an  appropriate  extended  mass  range 
for  the  presence  of  derivatives  of  modified  amino  acid  residues  (data  not  shown).  In  cases 
in  which  the  311  PTH  mass  did  not  unambiguously  identify  the  modified  residue,  the 
structure  of  the  derivative  was  further  analyzed  by  collision-induced  fragmentation  in  a 
tandem  MS  experiment  (data  not  shown). 
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Figure  6,  Determination  of  site  of  protein  tyrosine  phosphorylation  by  automated  peptide  sequencing/ESI-MS 
detection.  A  tyrosine  phosphorylated  peptide  derived  from  p60”^  was  applied  to  automated  protein  sequencer 
and  sequenced  as  described  in  the  text.  The  sample  extracted  from  the  sequencer  after  each  sequencing  cycle 
was  injected  onto  a  C18  RP-HPLC  column  (Reliasil  C18;  1x50  mm)  (Michrom  Bioresources)  and  the  column 
effluent  was  analyzed  by  ESI-MS.  The  MS  was  operated  in  the  multiple  ion  monitoring  mode.  Data 
corresponding  to  the  extracted  mass  of  311  PTH  phosphotyrosine  [(M+H)'"=555]  are  displayed  for  the 
sequencing  cycles  around  the  detected  phosphorylation  site. 


CONCLUSIONS 

In  this  manuscript  we  describe  a  suite  of  complementary  techniques  for  the  high 
sensitivity  determination  of  the  covalent  structure  of  proteins  separated  by  gel  electropho¬ 
resis.  The  described  techniques  focus  on  the  two  tasks  which  in  our  view  will  dominate  the 


Proteins  Separated  by  Gel  Electrophoresis  at  the  Primary  Structure  Level 


13 


work  of  analytical  protein  chemists.  The  first  is  the  identification  of  proteins  relevant  to  a 
biological  system  or  process  at  the  level  of  the  primary  structure  and  the  second  is  the 
determination  and  localization  of  modified  or  unnatural  residues  within  an  amino  acid 
sequence.  For  both  tasks  we  have  developed  two-pass  processes  which  consist  of  a  rapid, 
sensitive  and  simple  initial  screen  which  is  followed,  if  necessary,  by  a  more  detailed,  slower, 
less  sensitive  but  general  and  conclusive  secondary  analysis. 

Our  strategy  to  identify  proteins  at  the  sequence  level  takes  advantage  of  the  power 
of  rapidly  growing  sequence  databases,  the  rapidly  evolving  capabilities  in  protein  and 
peptide  mass  spectrometry  and  includes  a  new  sequencing  chemistry  for  amino  acid  sequenc¬ 
ing  at  enhanced  sensitivity. 

The  technique  for  the  localization  and  identification  of  modified  residues  within 
polypeptides  combines  the  specificity  of  enzyme  reactions  with  the  sensitivity  and  reliability 
of  LC-MS  and  CE-MS  analysis.  For  de-novo  identification  of  modified  residues  we  rely  on 
chemical  stepwise  peptide  degradation  using  a  novel  protein  sequencing  chemistry. 

The  described  systems  are  modular.  They  consist  of  several  components  which  are 
easily  interfaced.  Individual  components  can  be  easily  exchanged  without  interfering  with 
the  performance  of  the  system.  For  example,  the  described  systems  are  compatible  with  a 
variety  of  gel  electrophoresis  techniques,  with  essentially  any  separation  technique  and  with 
different  MS  techniques  and  instruments. 

The  central  part  of  all  the  described  instruments  is  a  data  system  which  stores  and 
integrates  the  data  generated  by  the  subsystems  and  allows  for  data  analysis  and  interpreta¬ 
tion  in  a  synergistic  manner.  Clearly  it  is  these  data  interpretation  and  analysis  aspects, 
together  with  systems  integration  aspects  will  require  significant  research  and  development 
efforts  to  make  protein  analytical  technology  even  more  powerful. 
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INTRODUCTION 

One-dimensional  (1-D)  or  two-dimensional  (2-D)  polyacrylamide  gel  electrophoresis 
is  a  convenient  technique  for  purifying  small  amounts  of  proteins  from  very  complex  mixtures 
(O’Farrell,  1975;  Cells  &  Bravo,  1984),  For  structural  analysis,  proteins  are  electrotransferred 
from  the  gels  onto  immobilizing  membranes  for  subsequent  NH2-terminal  sequence  analysis 
(Vandekerckhove  et  al.,  1985;  Aebersold  et  ai,  1986;  Matsudaira,  1987).  Alternatively, 
proteins  can  be  cleaved  either  as  membrane-bound  molecules  (Aebersold  et  al,  1987;  Bauw  et 
al  ,1988)  or  when  still  present  in  the  gel  matrix  (Rosenfeld  et  al ,  1 992).  The  resulting  peptides 
are  then  separated  for  further  characterization.  Recently,  computer  searching  algorithms  have 
been  developed  that  use  peptide  mass  fingerprinting  to  identify  proteins  whose  sequences  are 
stored  in  databases  (Mann  et  al,  1993;  Pappin  et  al,  1993;  Yates  et  al,  1993).  Such  peptide 
mass  information  can  be  obtained  from  previously  unseparated  mixtures  using  matrix  assisted 
laser  desorption  ionization  time  of  flight  mass  spectrometry  (MALDI-TOF-MS)  (Mann  et  al, 
1993,  Zhang  et  al,  1994)  or  from  a  reversed  phase  column  eluate  on-line  connected  with  an 
electrospray  ionization  mass  spectrometer  (ESI-MS).  Conventional  automated  Edman  de¬ 
gradation  techniques  or  mass  spectrometric-based  methods  allow  sample  analysis  in  the  low 
picomole  or  even  femtomole  range.  Unfortunately,  when  only  such  small  amounts  are  present 
in  the  starting  mixture  it  is  difficult  to  purify  and  digest  the  protein  with  high  peptide  recoveries. 
This  is  probably  due  to  adsorption  of  a  fraction  of  the  protein  within  the  pores  of  the 
immobilizing  membrane  by  which  they  are  trapped,  out  of  reach  of  the  proteases.  In  case  of 
in-gel  cleavage  a  fraction  of  the  protein  or  its  fragments  stay  inside  the  gel  matrix.  The  problems 
mentioned  above  can  be  reduced  by  working  in  small  but  highly  concentrated  protein  spots, 
thus  reducing  the  membrane  surface  or  gel  matrix  volume.  When  only  small  amounts  of 
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proteins  are  present  in  the  gels,  these  conditions  are  seldomly  met  and  therefore  it  is  necessary 
to  combine  several  protein  spots  into  a  small  volume.  In  this  paper  we  describe  a  method  that 
allows  to  reproducibly  elute  and  concentrate  proteins  from  combined  gel  pieces.  The  concen¬ 
tration  factor  can  be  larger  then  50  and  the  protein  is  concentrated  into  an  agarose  gel.  The 
protein  is  then  melted  out  of  the  agarose  gel  prior  to  proteolytic  cleavage  so  that  the  digestion 
proceeds  in  a  soluble  phase.  The  overall  peptide  yields  are  at  least  70%  of  those  obtained  from 
direct  cleavage  in  free  solution.  This  approach  does  not  suffer  from  the  problems  of  adsorption 
or  in-gel  trapping  and  should  therefore  be  a  better  procedure.  In  addition  the  technique  is 
amenable  for  miniaturization.  Here  we  report  our  initial  experiences  with  this  technique  and 
show  that  it  can  serve  as  an  efficient  link  between  polyacrylamide  gel  purification  and  protein 
identification  by  microsequencing  or  mass  spectrometry  in  the  very  low  picomole  range. 


MATERIALS  AND  METHODS 

Tosyl-L-phenylalanine  chloromethylketone-treated  trypsin  was  obtained  from  Sigma 
Chemical  Company,  St.  Louis,  USA.  Agarose  (ultrapure,  electrophoresis  grade)  was  from 
BRL  Life  Technologies  Inc.,  Gaithersbury,  USA.  Recombinant  rat  liver  PFK-2/FBPase-2 
was  given  to  us  by  Dr.  Crepin  (International  Institute  of  Cellular  and  Molecular  Pathology; 
Brussels)  and  rabbit  skeletal  muscle  actin  was  prepared  as  described  by  Spudich  and  Watt 
(1971).  All  other  chemicals  were  from  Janssen  Chimica,  Beerse,  Belgium;  Merck, 
Darmstadt,  Germany  or  Serva,  Heidelberg,  Germany. 

SDS-Polyacrylamide-Gel  Electrophoresis 

Proteins  were  separated  by  SDS-PAGE  using  the  mini-gelelectrophoresis  design 
originally  published  by  Matsudaira  and  Burgess  (1978).  The  gels  were  0.5  mm  thick.  Proteins 
were  detected  by  staining  for  20  min.  in  a  solution  of  0.25%  (w/v)  Coomassie  Brilliant  Blue 
R250  (Serva,  Heidelberg,  Germany)  in  45%  (v/v)  methanol/9%  (v/v)  acetic  acid.  Destaining 
was  carried  out  in  5%  (v/v)  methanol/7.5%  (v/v)  acetic  acid  for  2  hours.  Protein  bands  were 
excised,  collected  in  Eppendorf  tubes  and  washed  in  water  for  1  hour  by  agitation.  The 
washed  protein  bands  were  cut  in  ±  1  mm  x  1  mm  pieces  and  equilibrated  for  at  least  1  h  in 
100-200  pi  of  sample  buffer  (1%  (w/v)  SDS,  10%  (v/v)  glycerol,  50  mM  dithiothreitol,  12 
mM  Tris/HCl  pH  7. 1 ).  Note  that  the  sample  buffer  does  not  contain  Bromophenol  Blue.  This 
sample  is  ready  for  elution  and  concentration  (see  below). 

Narrowbore  Reverse  Phase  HPLC  and  Microsequencing 

Peptides  were  separated  by  reverse  phase  HPLC  on  a  Cl  8  Vydac  2.1  mm  x  250  mm 
column  (Separations  Group,  Hesperia,  CA,  USA),  equilibrated  in  0.1%  (v/v)  trifluoracetic 
acid/5%  (v/v)  acetonitrile  (solvent  A).  A  linear  gradient  program  of  5-100%  B  in  100  min, 
where  B  =  0.1%  (v/v)  trifluoroacetic  acid/70%  (v/v)  acetonitrile,  was  used  to  elute  the 
peptides  at  a  flow  rate  of  80  pl/min,  and  the  absorbance  was  monitored  at  214  nm  using  an 
Applied  Biosystems  759  A  absorbance  detector.  Peptides  were  collected  by  hand  in  Eppen¬ 
dorf  tubes  and  directly  applied  on  the  gas-phase  sequenator  (Applied  Biosystems  model 
470A  or  477A)  equipped  with  a  120Aphenylthiohydantoin-amino  acid  analyzer. 

On-Line  Electrospray  Ionization  Mass-Spectrometry 

The  outlet  of  the  narrowbore  HPLC  (run  as  described  above)  was  connected  to  a 
solvent  splitter  which  directed  80%  of  the  eluate  into  the  absorbance  detector  and  20  /o  into 
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the  mass  spectrometer  (Fisons/VG  Platform,  Manchester,  UK)  via  an  interface  employing  a 
0.0025  inch  i.d.  microbore  polyetheretherketone  tubing  inlet.  The  instrument  is  equipped 
with  an  electrospray  ion  source  and  the  m/z  ratios  were  measured  by  a  quadrupole  analyzer. 
The  flow  rate  of  the  peptide  carrier  solvent  at  the  inlet  of  the  interface  was  1 6  pl/min.  Droplet 
evaporation  was  achieved  by  a  flow  of  warm  (60°C)  dried  nitrogen  gas.  The  mass  spectrome¬ 
ter  was  scanned  from  m/z  300  to  1300  at  a  rate  of  6s  per  scan  during  the  first  part  of  the 
chromatogram  (between  15  and  45  min  after  injection).  In  the  second  part  of  the  chromato¬ 
gram  (from  44  min  to  the  end  of  the  run)  scans  were  made  every  6s  from  m/z  500  to  m/z 
1600. 

Matrix  Assisted  Laser  Desorption  Ionization  Time  of  Flight  Mass 
Spectrometry 

Actin,  concentrated  in  a  volume  of  ±  6pl  agarose  gel  (1.45%  agarose,  0.1%  SDS, 
0.36  M  Tris-HCl  pH  8.7)  was  heated  at  80°C  for  5  min.  The  molten  gel  was  mixed  with  an 
equal  volume  of  digestion  buffer,  containing  0.1  M  Tris-HCl  pH  8.5  and  0.1  pg  trypsin  and 
kept  at  0°C.  Rapid  mixing  produces  a  fast  drop  in  the  sample  temperature,  limiting  the  rate 
of  trypsin  autodigestion.  The  sample  was  kept  at  37°C  overnight  as  a  solid  gel  and  the 
digestion  was  terminated  by  melting  the  gel  at  80°C.  An  aliquot  of  5  pi  was  removed  from 
the  molten  gel,  acidified  with  1  pi  of  50%  trifluoroacetic  acid  and  mixed  with  an  equal 
volume  of  a  saturated  solution  of  a-cyano-4-hydroxycinnamic  acid  in  acetonitrile  and 
trifluoroacetic  acid  (1:2  v/v)  (Beavis  et  aL,  1992).  2  pi  of  this  mixture  was  applied  to  the 
sample  support  and  dried.  The  MALDI  mass  spectra  were  recorded  in  the  linear  mode  with 
a  Bruker  Bioflex  (Bruker  Instruments  Inc.,  Bremen,  Germany).  The  spectra  shown  represent 
the  accumulation  of  150  sample  shot  spectra  taken  with  a  conventional  UV  laser  (nitrogen, 
337  nm)  set  at  an  attenuation  of  50-35.  The  acceleration  voltage  was  at  28.5  kV  and  low 
molecular  weight  ions  were  deflected  with  a  1.7  kV  puls  of  500  nanoseconds. 


RESULTS  AND  DISCUSSION 

Protein  in-Gel  Concentration  Procedure 

The  construction  of  the  mini-agarose  concentration  gel  is  shown  in  Fig.  1 .  The  gel  is 
cast  between  two  glass  plates  10  cm  x  9  cm,  separated  by  spacers  1  cm  wide  and  0.75  mm 
thick  and  clamped  together.  They  are  sealed  at  the  vertical  edges  with  molten  2%  agarose. 
A  strip  of  Whatman  3  MM  paper  is  inserted  at  the  bottom  and  serves  as  support  for  the  lower 
agarose  gel  stopping  it  from  slipping  during  electrophoresis.  The  sample  well  is  formed  by 
a  1  cm  wide  x  0.75  mm  thick  spacer  set  between  two  parallel  spacers  each  0.5  cm  wide  x 
0.75  mm  thick  inserted  at  the  center  of  the  glass  plates  and  attached  with  adhesive  tape  at 
the  top  edge  of  the  back  plate.  The  volume  of  the  sample  well  can  be  varied  by  changing  the 
depth  of  the  slot  forming  spacers  and  the  height  of  the  agarose  concentration  gel.  The  most 
convenient  construction  is  shown  in  figure  1 . 

The  lower  gel  is  an  agarose  gel,  2  cm  deep,  1.45%  w/v  agarose  in  0.36  M  Tris/HCl, 
pH  8.7,  containing  0.1%  SDS  (w/v),  poured  as  a  freshly  prepared  hot  molten  solution.  Once 
the  agarose  has  set,  it  is  overlayed  with  the  polyacrylamide  stacking  gel,  composed  of  5.45% 
acrylamide,  0.13%  bisacrylamide,  0.12  M  Tris/HCl  pH  6.8  and  0.1%  SDS.  The  level  of  the 
stacking  gel  reaches  up  to  1  cm  from  the  top  edge  of  the  front  plate.  When  the  stacking  gel 
has  set,  the  central  spacer  is  removed,  leaving  a  well  ±  2  cm  high,  1  cm  wide  and  0.75  mm 
thick.  The  mini  concentration  gel  is  then  mounted  on  a  small  electrophoresis  tank  (LKB- 
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Figure  1.  Construction  of  the  mini  agarose  concentration  gel. 


Produkter  AB,  Bromma,  Sweden)  and  the  slot  is  filled  with  gel  pieces  collected  from 
Coomassie  Blue  stained  protein  bands  cut  from  primary  mini-polyacrylamide  gels  and 
equilibrated  with  sample  buffer.  The  sample  well  can  accommodate  approximately  fifty  1  x 
1  mm  gel  pieces.  The  remaining  volume  is  further  filled  with  blank  gel  pieces  also 
equilibrated  in  sample  buffer  and  if  necessary  with  additional  sample  buffer,  such  that  the 
level  is  5  mm  above  that  of  the  edge  of  the  stacking  gel.  The  electrophoresis  run  is  started 
at  100  V,  allowing  the  proteins  to  elute  out  of  the  combined  gel  pieces.  The  concentration 
effect  is  obtained  by  a  combination  of  protein  stacking  and  a  horizontal  compression  of  the 
stacked  proteins.  This  is  illustrated  in  figure  2  showing  a  series  of  time-lapse  photographs 
of  the  migration  of  a  coloured  protein  through  the  mini-concentration  gel.  In  this  particular 
experiment,  the  sample  well  was  filled  up  with  blank  gel  pieces  equilibrated  in  sample  buffer 
to  which  a  cytochrome  c  solution  was  added.  After  running  at  100  V,  the  protein  enters  the 
stacking  gel  as  a  sharp  band  between  the  two  vertical  sample  spacers  (Fig.  2A).  At  this  point 
the  central  spacer  is  re-inserted  into  the  sample  well  and  the  voltage  is  increased  to  1 50-200 
V.  The  migration  of  the  protein  between  the  spacers  is  thus  retarded  with  respect  to  the 
migration  of  the  solvent  front  down  the  sides  of  the  spacers  (Fig.  2B).  The  slot  forming  spacer 
is  again  removed  when  the  solvent  front  on  the  outside  has  passed  the  end  of  the  two  vertical 
spacers  and  moves  inwards  (Fig.  2C).  The  protein  band  which  has  been  retarded  is  now 
compressed  into  a  small  spot  in  the  stacking  gel  (Fig.  2D).  This  spot  moves  further  into  the 
agarose  gel  (Fig  2E).  The  electrophoresis  procedure  is  stopped  when  the  protein  has  migrated 
in  the  agarose  gel  over  a  distance  of  approximately  5  mm  (Fig  2F).  Note  that  the  sample 
buffer  used  in  this  experiment  did  not  contain  the  tracking  dye  Bromophenol  Blue.  When 
Coomassie  Blue  is  present  in  the  protein  samples  it  is  well  separated  from  the  protein 
probably  by  an  isotachophoretic  effect,  even  though  the  agarose  has  no  separating  capacity. 
Several  points  are  important  in  order  to  obtaining  good  protein  concentration. 

1.  Protein  concentration  is  controlled  by  re-inserting  the  slot  forming  spacer  during 
electrophoresis,  so  that  the  migration  of  the  protein  between  the  central  spacers 
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Figure  2.  Time-lapse  photographs  of  the  migration  of  a  protein  through  the  agarose  concentration  gel. 


acquires  a  ±  1  cm  handicap  versus  the  front  in  the  stacking  gel  before  it  leaves  the 
exit  of  the  sample  well  (see  below). 

2.  Control  of  the  pH  difference  between  the  sample  and  the  stacking  gel  is  important 
for  obtaining  good  sample  concentration.  The  sample  pH  is  7. 1 ,  that  of  the  stacking 
gel  is  6.8.  Therefore,  the  polyacrylamide  gel  pieces  must  be  washed  extensively 
with  distilled  water  to  remove  remaining  acetic  acid  of  the  destain  before  equili¬ 
brating  in  sample  buffer. 

3 .  The  gel  pieces  should  be  kept  as  small  blocks  and  not  crushed  in  order  to  avoid 
clogging  the  sample  slot  and  trapping  air  bubbles  formed  during  electrophoresis. 

4.  It  is  important  that  the  sample  well  is  filled  completely  with  gel  pieces.  If  the 
protein- containing  gel  pieces  do  not  completely  fill  the  well;  blank  gel  pieces  can 
be  added  to  build  up  the  level. 

Following  fixation  and  staining  with  Coomassie  Blue,  the  protein  is  seen  as  a  ±  2 
mm^  spot,  representing  a  concentration  factor  of  about  50  fold.  The  gel  is  carefully  washed 
with  distilled  water  to  remove  excess  of  acid.  The  spot  is  then  excised  in  a  minimal  volume 
of  agarose  gel  (±  5  pi)  and  transferred  into  an  Eppendorf  tube.  The  agarose  gel  is  melted  in 
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Figure  3.  Tryptic  peptide  profile  of  recombinant  rat  PFK-2/FBP-ase-2  analysed  by  on-line  narrowbore 
HPLC/ESI-MS. 


50  |dl  of  digestion  buffer  (0.1  M  Tris/HCl,  pH  8.7,  0.2%  n-octyl-B-glucoside)  by  heating  at 
±  80°C  for  ±  2  min  and  cooled  to  37°C.  Trypsin  (0.1  pg  in  1  pi)  is  added  and  the  digestion 
is  carried  out  overnight  at  37°C.  Following  digestion,  the  peptide  mixture  is  frozen  at  -80°C 
for  at  least  1  h,  thawed  and  centrifuged  at  full  speed  during  5  min  in  an  Eppendorf  centrifuge 
to  remove  the  precipitated  agarose.  The  supernatant  is  used  for  mass  spectrometry  and 
narrowbore  HPLC  analysis. 

Characterization  of  in-Gel  Concentrated  Proteins 

The  procedure  described  above  is  illustrated  using  two  proteins:  recombinant  rat 
6-phosphofructose-2~kinase/fructose  2,6-bisphosphophatase  (PFK-2/FBPase-2)  and  skele¬ 
tal  muscle  actin.  In  the  first  experiment  recombinant  PFK-2/FBPase-2  was  passed  over  mini 
SDS-PAGE.  A  total  amount  of  3.5  pg  or  63  pmol  was  divided  over  five  wells  of  the  primary 
gel.  After  Coomassie  Blue  staining,  protein  bands  were  excised,  combined,  concentrated  in 
an  agarose  gel  and  digested  with  trypsin  in  molten  gel.  The  peptide  solution  was  frozen  at 
-80°C  and  thawed.  The  agarose  in  the  pellet  fraction  was  separated  from  the  peptide  mixture 
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Tryptic  peptides  of  PFK-2/FBP-ase-2 


Number 

Position 

Theoretical  mass 
(Da) 

Found  mass 
(Da) 

i 

231-238 

968.0 

967.4 

7 

55-60 

711.8 

712.9 

9 

357-360 

656.8 

658.6 

10 

269-278 

932.2 

932.0 

11 

448-457 

1186.3 

1186.0 

12 

292-299 

818.9 

818.7 

13 

3-11 

1064.2 

1063.0 

15 

258-266 

1054.1 

1054.5 

17 

122-136 

1646.7 

1646.1 

17 

364-373 

1195.2 

1195.8 

18 

74-82 

1152.3 

1153.3 

20 

458-470 

1444.5 

1444.0 

21 

407-415 

1051.1 

1050.4 

22 

15-28 

1596.8 

1596.0 

23 

153-171 

2194.5 

2193.0 

24 

64-73 

1192.4 

1191.7 

25 

308-323 

1834.1 

1834.2 

26 

89-104 

2013.1 

2012.4 

29 

374-383 

1228.5 

1228.3 

30 

31-52 

2303.7 

2305.6 

32 

281-291 

1329.5 

1328.7 

Figure  3.  Continued. 


in  the  supernatant  by  centrifugation  in  an  Eppendorf  centrifuge.  Peptides  were  separated  by 
narrowbore  reversed  phase  HPLC  with  a  solvent  splitter  directing  80%  of  the  eluting  peptides 
into  the  UV  detector  and  20%  into  the  mass  spectrometer.  The  UV  absorbance  profile  is 
shown  in  figure  3  A.  The  m/z  spectra  of  some  selected  peptides  are  shown  in  figures  3B  and 
3C.  Peptides  whose  molecular  weights  could  be  deduced  from  the  scans  are  listed  in  Fig.  3. 
Of  the  23  peptides  measured,  21  were  assigned  as  fragments  of  the  recombinant  protein 
(Crepin  etaL,  1989).  This  was  further  confirmed  by  NH2-terminal  sequence  analysis  (results 
not  shown).  The  two  remaining  peptides  are  probably  trypsin  autodegradation  products. 

This  experiment  provides  important  points  of  information  necessary  for  future 
development  and  application  of  this  technique.  First,  proteins  available  in  the  50  picomole 
range  (here  we  refer  to  total  amounts  loaded  on  the  primary  gel)  can  be  recovered  from 
combined  gel  pieces  and  digested  in  the  agarose  solution  with  a  high  overall  recovery.  In  the 
example  shown,  peptides  are  recovered  with  ±  75%  yield.  Second,  the  resolution  of  the 
HPLC  peptide  separation  is  not  significantly  affected  by  small  amounts  of  agarose  polymers 
left  over  in  the  peptide  supernatant  after  agarose  precipitation.  Third,  in  a  similar  way,  on-line 
electrospray  mass  spectrometry  is  not  strongly  affected  by  possible  remaining  agarose 
components.  It  should  be  mentioned  however  that  some  low  molecular  components  are 
regularly  observed  in  the  mass  spectra  of  some  peptides  (see  for  instance  Fig.  3C).  This  may 
impose  a  lower  detection  limit  when  ESI-MS  analysis  is  used.  The  use  of  capillary  chroma¬ 
tography  will  undoubtly  improve  the  sensitivity.  However,  when  sufficient  peptide  has  to  be 
saved  for  subsequent  sequence  analysis,  the  practical  lower  limit  of  the  procedure  is  expected 
to  be  around  30-50  pmol.  Fourth,  the  presence  of  the  agarose  in  the  digestion  mixture  does 
not  result  in  NH2-terminal  blocking  of  the  peptides.  This  follows  from  a  comparison  of  the 
initial  yields  of  the  phenylthiohydantoin  amino  acids  of  the  peptides  which  did  not  differ  in 
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Theoretical  Found 
Position  mass  (Da)  mass  (Da) 


Mass 

difference 

(Da) 


Theoretical  Found 
Position  mass  (Da)  mass  (Da) 


Mass 

difference 

(Da) 


Tryptic  peptides  of  skeletal  muscle  actin 


209-212 

194-198 

180-185 

287-292 

331-337 

64-70  * 

21-30 

199-208 

42-52 


515.6 

630.7 

643.7 

733.8 

794.9 
800.0 
976.0 

1130.2 

1171.4 


518.0 

633.0 

646.0 

735.5 

796.2 

801.2 

976.6 

1130.3 

1171.4 


Autodigestion  products  of  trypsin 


108-115 

98-107 

158-178 

58-77 

78-97 


842.0 

1045.1 

2158.5 
2211.4 

2283.6 


843.1 

1045.7 

2156.7 

2210.7 
2283.9 


+  2.4 
+  2.3 
+  2.3 
+  1.7 
+  1.3 
+  1.2 
+  0.6 
+  0.1 
+  0.0 


+  1.1 
+  0.6 
-  1.8 
-0.7 
+  0.3 


31-41 

362-374 

87-97 

241-256 

98-115 

21-41  * 

294-314 

293-314 


1198.4 

1500.6 

1515.7 
1790.9 
1956.2 

2156.4 

2246.5 

2374.7 


1198.3 

1500.2 
1515.1 

1790.3 

1955.4 
2156.7 

2246.4 
2374.9 


-0.1 
-0.4 
-0.6 
-0.6 
-0.8 
+  0.3 
-0.1 
+  0.2 


Figure  4.  MALDI  mass  spectrometric  peptide  map  of  a  tryptic  digest  of  actin. 
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(Da) 

Found  mass 
(Da) 

Mass  difference 
(Da) 

108-115 

842.0 

843.5 

+  1.5 

209-216 

906.1 

906.9 

+  0.8 

98-107 

1045.1 

1045.9 

+  0.8 

158-178 

2158.5 

2157.0 

-  1.5 

58-77 

2211.4 

2210.9 

-0.5 

78-97 

2283.6 

2284.2 

+  0.6 

Figure  4.  Continued 


the  molten  agarose  from  those  obtained  in  free  solution  (results  not  shown).  In  the  second 
experiment  we  have  mainly  used  the  agarose  concentration  gel  approach  with  respect  to  the 
problem  of  protein  identification  by  tryptic  peptide  mass  fingerprinting  using  as  few  protein 
as  possible.  Now  the  protein  was  digested  in  a  smaller  volume  and  the  peptide  mixture  was 
not  separated  from  the  agarose  for  subsequent  MALDI-TOF-MS  analysis.  Ten  pmol  (0.42 
pg)  of  actin  were  recovered  in  6  pi  agarose  gel  (1.45%  agarose).  This  was  melted  at  80°C 
and  mixed  with  an  equal  volume  of  digestion  buffer  containing  0. 1  pg  of  trypsin  kept  at  0°C. 
By  rapid  mixing  the  temperature  is  immediately  shifted  to  about  40°C,  reducing  heat 
denaturation  of  the  trypsin.  The  agarose  (final  concentration  0.7%)  stays  solid  at  this 
temperature.  The  digestion  is  continued  overnight  at  37°C  and  terminated  by  heating  at  80°C 
and  acidification  with  1  pi  of  50%  TFA  in  water.  An  aliquot  (5  pi)  of  the  molten  phase  is 
removed,  mixed  with  5  pi  of  the  matrix  solution  and  2  pi  is  taken  for  analysis.  The  amount 
analysed  corresponds  with  770  fmol  of  peptide,  assuming  complete  cleavage  of  the  protein. 
The  corresponding  MALDI-TOF-MS  spectrum  is  shown  in  figure  4A.  When  compared  with 
the  blank  analysis  of  trypsin  (Fig  4B)  we  notice  17  peptides  which  could  be  assigned  to  the 
actin  sequence  (3  of  these  peptides  resulted  from  partial  digestion).  This  information  is 
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sufficient  to  identify  the  protein  by  molecular  weight  search  in  the  tryptic  peptide  mass 
database.  The  high  amount  of  trypsin  autodigestion  fragments  is  not  surprising  since  the 
substrate  to  trypsin  ratio  used  in  this  experiment  was  4/1  and  trypsin  denaturation  is  likely 
in  view  of  the  use  of  0.05%  SDS  during  the  digestion  and  the  possible  heat  denaturation 
resulting  from  sample  mixing. 

In  the  example  given  here,  10  pmol  of  actin  could  be  readily  identified,  however  the 
quality  of  the  MALDI-TOF  spectra  suggest  that  the  starting  amount  of  protein  could  be 
further  decreased.  The  way  this  can  be  done  may  be  directed  by  the  finding  that  protein 
digestion  with  trypsin  can  proceed  with  the  same  efficiency  in  the  gel  phase  as  in  solution  - 
the  agarose  can  be  melted  at  high  temperature  either  to  obtain  adequate  mixing  between  the 
substrate  and  trypsin,  or  to  take  aliquots  of  the  digestion  mixture  for  analysis.  Secondly  and 
probably  more  importantly,  MALDI-TOF-MS  analysis  can  be  successfully  carried  out  on  a 
peptide  mixture  still  embedded  in  the  agarose  gel.  This  gives  us  the  possibility  to  avoid  the 
agarose/peptide  separation  step  which  takes  place  in  a  large  volume  and  to  work  in  small 
volumes. 

CONCLUSIONS  AND  PERSPECTIVES 

A  feasibility  study  was  made  to  use  an  agarose  gel  as  protein  holding  matrix  in  which 
proteolytic  cleavage  can  be  carried  out  or  from  which  mass  spectrometric  peptide  analysis 
could  be  started.  This  approach  could  serve  as  an  alternative  for  polyacrylamide  in-gel 
digestion  or  on-membrane  digestion  procedures  (Rosenfeld  et  al,  1992;  Aebersold  et  al, 
1986;  Bauw  et  al,  1988).  The  reason  for  such  a  study  is  based  on  potential  advantages  of 
using  agarose  versus  other  procedures.  First,  the  agarose  can  be  melted,  converted  into  a 
liquid  phase  and  therefore  render  the  protein  much  more  accessible  for  digestion  while  the 
recovery  of  the  peptides  should  be  more  efficient.  A  second  advantage  is  the  finding  that 
MALDI-TOF-MS  analysis  can  be  performed  on  peptide  mixtures  in  the  presence  of  high 
concentrations  of  agarose,  opening  the  possibility  for  high  sensitivity  mass  spectrometric 
protein  identification.  We  have  described  in  detail  a  procedure  to  transfer  proteins  from 
combined  pieces  of  stained  polyacrylamide  gels  into  an  agarose  gel.  By  deformation  of  the 
electrical  field,  the  eluting  protein  can  be  directed  into  a  highly  concentrated  spot.  In  one 
strategy,  the  protein  containing  agarose  gel  is  melted,  followed  by  dilution  with  digestion 
buffer.  Protein  cleavage  now  proceeds  in  the  liquid  phase  at  37°C.  The  example  shown  starts 
from  63  pmol  of  protein  and  peptides  are  subsequently  separated  by  narrowbore  HPLC 
on-line  connected  with  an  electrospray  ion  source.  Peptide  yields  are  sufficient  for  both 
peptide  mass  fingerprinting  and  individual  amino  acid  sequence  analysis.  The  sensitivity  of 
the  procedure  described  here  could  further  be  increased,  e.g.  by  trying  to  concentrate  the 
protein  in  smaller  volumes  and  by  using  capillary  chromatographic  methods.  Although  these 
modifications  seem  possible,  the  lower  limits  will  probably  be  set  by  contaminants  derived 
from  the  incomplete  agarose-peptide  separation,  interfering  with  ESI-MS.  In  the  second 
approach  we  have  digested  the  protein  in  solid  agarose.  This  was  achieved  by  melting  the 
agarose  gel  piece  at  80°C  and  consecutively  mixing  it  with  an  equal  volume  of  cold  buffer 
containing  trypsin.  The  digestion  now  proceeds  in  solid  0.75%  agarose.  At  the  end  of  the 
digestion  the  matrix  is  melted  again  for  sample  preparation.  MALDI-TOF-MS  peptide  mass 
analysis  is  done  in  the  presence  of  agarose,  avoiding  sample  dilution  or  the  agarose-peptide 
separation  step.  This  second  approach  is  particularly  attractive  because  it  should  give  us  the 
opportunity  to  characterize  very  small  amounts  of  gel-separated  proteins.  The  example 
shown  starts  from  10  pmol  of  a  42  kDa  protein.  Further  miniaturization  of  the  system  (e.g. 
concentrating  in  1  pi  volumes)  will  allow  us  to  reduce  the  starting  amount  of  protein  by  a 
factor  of  five  or  more.  At  this  stage  the  lower  limit  will  probably  not  be  set  by  the  limits  of 
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the  miniaturization  but  probably  by  yet  unknown  parameters  related  with  interactions  with 
or  modifications  by  either  the  primary  polyacrylamide  gel  or  the  secondary  agarose  gel  which 
may  occur  at  these  extreme  low  protein  amounts.  Other  problems  may  be  related  to  staining 
and  destaining  procedures  and  have  to  be  considered  in  future  experiments.  The  use  of 
agarose  as  a  holding  matrix  has  clearly  some  interesting  perspectives.  While  it  allows  to 
obtain  peptides  in  high  yields  for  further  HPLC  separation,  the  combination  with  in-gel 
MALDI-TOF  mass  spectrometric  analysis  of  the  peptide  mixture  opens  the  possibility  of 
protein  characterization  on  quantities  which  could  never  be  reached  previously. 
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INTRODUCTION 

Reversed-phase  high-performance  liquid  chromatography  (RP-HPLC)  and  polyacry¬ 
lamide  gel  electrophoresis  (PAGE)  are  two  of  the  most  widely-used  high-resolution  tech¬ 
niques  for  isolating  proteins  and  peptides  for  structural  analysis  (Simpson  et  al.,  1 988, 1 989; 
Matsudaira,  1 993;  Patterson,  1 994).  In  recent  years,  the  importance  of  these  two  technologies 
has  been  further  enhanced  by  using  them  in  tandem.  For  example,  proteins  from  complex 
mixtures  such  as  total  cell  lysates  can  be  resolved  by  two-dimensional  gel  electrophoresis 
(2-DE)  and,  following  proteolytic  digestion,  the  generated  peptides  separated  by  microbore 
column  RP-HPLC.  Proteolytic  digestion  of  2-DE  gel  spots  can  be  accomplished  either  in 
the  gel  matrix  (Ward  et  al.,  1990,  Eckerskom  et  al.,  1990,  Rosenfeld  et  al,  1992,  Ji  et  al., 
1 994,  and  Heilman,  U.  personal  communication)  or  on  immobilizing  membranes  such  as 
polyvinylidine  difluoride  (Fernandez  et  al.,  1992)  and  nitrocellulose  (Aebersold  et  al.,  1987), 
following  electrotransfer  from  the  gel.  Protein  identification  can  be  achieved  by  microse¬ 
quence  analysis  of  the  isolated  peptides  using  either  automated  Edman  degradation  or 
tandem  mass  spectrometry  (Hunt  et  al.,  1986;  Burlingame  et  al.,  1994).  More  recently, 
alternative  means  of  protein  identification  such  as  peptide  mass  fingerprinting  (Pappin  et  al,, 
1993;  Mann  et  al.,  1993;  Henzel  et  al.,  1993,  James  et  al.,  1994)  and  amino  acid  composi¬ 
tional  analysis  (Sibbald  et  al.,  1991,  Shaw,  1 993)  have  emerged.  Since  these  latter  techniques 
have  the  potential  for  generating  large  quantities  of  data  rapidly,  there  is  now  an  increasing 
need  for  rapid  protein  and  peptide  isolation  procedures. 

Although  the  concept  of  high-speed  RP-HPLC  using  linear  velocities  in  the  range 
1000-5000  cm/hr  (i.e.,  0. 5-3.0  ml/min  for  a  2.1-mm  ID  column)  has  been  previously 
documented  (Kalghatgi  and  Horvath,  1987,  1988;  Nugent,  1990,  Fulton  et  al.,  1991, 
Regnier,  1991),  this  technology  has  been  slow  to  gain  general  acceptance  in  the  bios- 
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Lysozyme  (mg) 

Figure  1.  Frontal  loading  adsorption  isotherms  of  conventional  “wide  pore”  derivatised  silica  (Brownlee 
RP-300)  and  macroporous  divinylbenzene  crosslinked  polystyrene  (Poros  RII/H)  supports.  Protein:  1  mg/ml 
solution  in  aqueous  0.1%  TFA.  Superficial  linear  flow  velocities;  173,  347,  866,  1732  and  3465  cm/h. 
Temperature:  25°C.  (A)  RP-300  2.1  mm  ID  cartridge.  (B)  Poros  RII/H  2.1  mm  ID  column.  Adapted  and 
reproduced  with  permission  from  Moritz  et  al.,  1994. 


ciences.  This  has  been  due  to  a  number  of  shortcomings  in  the  methodology,  foremost 
of  these  being  stationary  phase  design.  Initial  attempts  at  designing  stationary  phase 
materials  that  Avould  meet  the  fundamental  criteria  of  fast  protein  chromatography,  such 
as  good  solvent  permeability  and  constant  retention  behaviour,  led  to  the  development 
of  non-porous  stationary  phase  packings  (Unger  et  al.,  1986;  Kalghatgi  and  Horvath, 
1987,  Yamasaki  et  al.,  1989).  With  these  packings,  “diffusion”  involving  the  intraparticle 
pores  is  eliminated  and  the  solvent  passage  is  restricted  to  interparticle  “convective  flow”. 
Since  “diffusion”,  which  results  in  slow  mass  transfer  of  analyte  within  particle  pores 
(into  and  out  of  stagnant  pools),  is  the  major  contributing  factor  to  peak  broadening  (Fig. 
1 ),  non-porous  packings  exhibit  minimal  reduction  in  peak  resolution  over  a  broad  range 
of  flow  rates.  However,  the  reduction  in  particle  surface  area  of  these  packings,  due  to 
the  elimination  of  pores,  results  in  their  inferior  binding  capacity  (-^  1.0  mg/g)  compared 
to  the  conventional  wide-pore  (3 00 A)  silica-based  packings  (--36  mg/g).  Moreover,  packed 
beds  of  non-porous  packings  are  stable  to  very  high  pressure  environments  (typically, 
6000  psi).  However,  these  packings  exhibit  very  high  pressure  drops  across  the  column 
thereby  restricting  their  use  at  very  high  linear  flow  velocities  (Yamasaki  et  al.,  1989; 
Nice  and  Simpson,  1989;  Rozing  and  Goetz,  1989). 

Many  of  the  problems  encountered  with  non-porous  packings  have  been  circum¬ 
vented  with  the  design  of  macroporous  packings,  such  as  “perfusive”  stationary  phases 
(Afeyan  et  al.,  1990)  and  “hyperdiffusive”  packings  (Boschetti,  1994).  The  salient  features 
of  these  packings  are  (i)  the  very  large  pore  diameters  (>  8000A),  (ii)  the  high  binding 
capacity  (compared  to  the  non-porous  packings),  and  (iii)  maintenance  of  resolution  (i.e., 
chromatographic  efficiency)  over  a  broad  range  of  linear  flow  velocities  (1000-5000  cm/h; 
i.e.,  0.5-3. 0  ml/min  for  a  2.1-mm  ID  column).  While  the  “perfusive”  packings  are  derived 
from  divinylbenzene  cross-linked  polystyrene  (PS-DVB),  the  “hyperdiffusive”  packings  are 
a  soft  agarose  gel  encased  in  a  rigid  PS-DVB  spherical  lattice.  Both  of  these  packings  are 
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robust  in  the  practical  operating  range  of  flow  rates,  but  at  very  high  flow  rates  their  utility, 
compared  to  some  of  the  widely-used  mesoporous  silica  packings  (e.g.,  300A),  is  limited 
due  to  their  fragile  nature. 

The  principal  advantages  that  macroporous  packings  afford  over  conventional  silica- 
based  reversed-phase  packings  are  purported  to  be  their  ability  to  operate  at  very  high  linear 
flow  velocities  (1000  cm/h)  while  maintaining  both  high  sample  loading  capacity  and 
increased  chromatographic  resolution  (Kassel  et  al.,  1994).  Here,  we  report  a  protocol  for 
fast  chromatographic  analysis  (<12  min)  of  proteins  and  peptides  using  a  conventional  3  00 A, 
7-pm  silica-based  support  and  standard  liquid  chromatographs.  Using  an  inexpensive 
conventional  2.1 -mm  “wide-pore”  reversed-phase  cartridge  and  rapid  linear  flow  velocities 
of 500- 1 000  cm/h  (0.3 -0.6  ml/min),  highly  reproducible  separations  can  be  achieved  in  1 0- 1 2 
min,  almost  a  magnitude  faster  than  standard  chromatographic  conditions,  without  any 
serious  compromise  in  chromatographic  efficiency. 


MATERIALS  AND  METHODS 

Bovine  serum  albumin,  bovine  pancreatic  ribonuclease-B,  hen  egg  lysozyme,  horse 
heart  myoglobin,  hen  egg  albumin  (ovalbumin),  rabbit  muscle  phosphorylase-b  and  bovine 
erythrocyte  carbonic  anhydrase  were  obtained  from  Sigma  (St  Louis,  MI).  Sequence  grade 
trypsin  was  purchased  from  Promega.  Coomassie  Brilliant  Blue  R250  (CBR-250)  was  from 
LKB-Pharmacia  (Uppsala,  Sweden),  10%  Tris-glycine  acrylamide  gels  from  Novex,  tri- 
fluoroacetic  acid  (TFA)  from  Pierce  and  HPLC  grade  solvents  were  obtained  from  Mallinck- 
rodt  (Melbourne).  High  purity,  deionized  water  was  obtained  from  a  tandem  Milli-RO  1 5  and 
Milli-Q  system  (Millipore,  Bedford,  MA).  All  other  reagents  used  were  of  analytical  grade 
quality. 

High-Performance  Liquid  Chromatography 

Instrumentation.  Protein  and  peptide  mixtures  were  fractionated  by  RP-HPLC  on  a 
Hewlett  Packard  model  1 090A  liquid  chromatograph  fitted  with  a  model  1 040A  diode-array 
detector.  Samples  were  injected  either  by  an  integrated  autoinjector  or  by  using  a  Rheodyne 
model  7125  injector  equipped  with  a  2-ml  injection  loop  installed  in  the  heated  column 
compartment.  Fractions  were  collected  manually  in  1 .5-ml  polypropylene  tubes  (Eppendorf) 
and  stored  at  -20°C, 

Column  Supports.  The  following  columns  were  used  in  this  study:  (a)  Brownlee 
RP-300  (300A  pore  size,  7-jLim  particle  diameter,  octylsilica  packed  into  a  100  x  2. 1  mm  ID 
cartridge.  Applied  Biosystems,  Foster  City,  CA);  (b)  POROS  RII/H  (10-pm  divinylbenzene 
crosslinked  polystyrene  packed  into  a  100  x  2.1  mm  ID  stainless  steel  column,  Perseptive 
Biosystems,  Cambridge,  MA). 

SDS-Polyacrylamide  Gel  Electrophoresis 

Phosphorylase-b  (97,000  M^)  was  separated  in  1 .0  mm  thick  1 0%  SDS-gels  (Novex). 
Two-dimensional  gel  electrophoresis  (2-DE)  of  total  cell  lysates  from  cultured  human 
colonic  LIM1863  cells  (Ji  et  al.,  1993,  1994),  with  isoelectric  focussing  (lEF)  using  precast 
immobilized  pH  gradients  (Pharmacia)  in  the  first  dimension,  and  SDS-PAGE  in  the  second 
dimension  were  performed  as  described  (Ji  et  al.,  1 994).  Proteolytic  digestion  of  gel-resolved 
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proteins  was  performed  in-situ,  essentially  as  described  (Ward  et  al.,  1990a,  b)  with 
modifications  based  on  the  methods  of  Rosenfeld  et  al.,  (1992)  and  Heilman,  U  et  al.  (1994). 

Step  1.  Visualization  of  proteins  with  CBR-250.  Gel  staining  conditions:  50%  methanol  / 
10%  acetic  acid  /  0.1%  CB-R250  5-10  min).  Destaining  conditions:  12% 

methanol  /  7%  acetic  acid  for  1-1.5  h  (with  ~  3  changes). 

Step  2.  In-situ  proteolysis:  (i)  excise  stained  protein  gel  band;  (ii)  wash  twice  (~  200  pi) 
for  30  min  at  30°C  with  1%  ammonium  bicarbonate  /  50%  acetonitrile;  (iii)  dry 
gel  band  completely  by  centrifugal  lyophilisation  (Savant,  ~  30  min);  (iv)  rehy¬ 
drate  gel  band  twice  with  trypsin-containing  solution  ('^  0.5- 1.0  pg  trypsin  in  10 
pi  1%  ammonium  bicarbonate  /  0.5  mM  CaCl2)  for  15-30  min;  (v)  add  150  pi 
1%  ammonium  bicarbonate  containing  0.5  mM  CaCl2  and  incubate  at  37°C  for 
~  16  h. 

Step  3.  Peptide  extraction:  (i)  collect  enzymatic  digestion  buffer;  (ii)  add  200  pi  of  1% 
TFA,  sonicate  the  gel  mixture  for  ~  30  min  (35-40°C)  and  collect  the  extract;  (iii) 
add  200  pi  of  0.1%  TFA/  60%  acetonitrile  and  sonicate  the  mixture  for  ~  30  min 
at  35-40°C  and  collect  the  extract;  (iv)  concentrate  the  pooled  extracts  by  cen¬ 
trifugal  lyophilization  to  a  final  volume  of  10-20pl  for  RP-HPLC. 


RESULTS  AND  DISCUSSION 

Protein  Binding  Capacity  and  Mass  Transfer  Kinetics  of  Conventional 
and  Macroporous  Packings 

A  conventional  wide-pore  silica-based  reversed-phase  column  (e.g.,  Brownlee 
RP-300,  C8,  7-pm,  300  A  particles)  was  evaluated  for  its  binding  capacity  and  mass 
transfer  kinetics  at  varying  superficial  linear  flow  velocities  and  compared  directly  with 
a  macroporous  column  (e.g.,  POROS  RII/H,  divinylbenzene  cross-linked  polystyrene, 
10-pm,  >  8000A  particles).  Using  frontal  analysis  chromatography,  it  can  be  seen  that 
the  total  protein  binding  capacity  (saturation  level)  for  lysozyme  was  significantly  greater 
(~  3-fold)  for  the  conventional  packing  11.5  mg)  than  the  macroporous  support  (~  4 
mg)  (Fig.  1).  The  protein  saturation  levels  for  both  conventional  and  macroporous 
packings  are  independent  of  linear  velocity  over  the  range  173-3465  cm/h.  It  should  be 
noted  for  the  conventional  packing  that  the  initial  binding  (or  breakthrough),  however, 
does  depend  on  linear  flow  velocity.  In  contrast  to  the  macroporous  packing,  marked 
variation  in  the  frontal  curve  shape  is  observed  for  the  conventional  silica-based  packing. 
For  example,  protein  breakthrough  at  173  cm/h  occurs  with  loads  >  11  mg,  while  at  a 
20-fold  higher  flow  rate  protein  breakthrough  occurs  at  7  mg.  This  variation  in  frontal 
curve  shape  for  the  conventional  support  is  indicative  of  “stagnant  mobile  phase  mass 
transfer”  (attributable  to  the  large  number  of  inaccessible  pockets  in  these  particles  where 
slow  or  unmoving  mobile  phase  accumulates  and  mass  transfer  occurs  by  extended-path- 
length  diffusion)  (Fig.  2).  This  observation  has  been  previously  reported  for  a  wide  range 
of  other  silica-based  supports  (Snyder  and  Kirkland,  1973).  The  lack  of  variation  in 
frontal  curve  shape  for  the  macroporous  packing  are  due  to  minimal  stagnant  mobile 
phase  pool  formation  (Fig.  IB),  a  feature  of  this  packing  design. 

The  amount  of  protein  that  one  can  load  on  a  reversed-phase  packing  is  dependant 
largely  on  the  total  surface  area  of  the  packing.  Narrow-pore  silica-based  porous  packings 
(60 A)  exhibit  very  high  surface  areas  (300  m^/g)  compared  to  non-porous  packings  (< 
5  m^/g)  (Yamasaki  et  al.,  1989,  Esser  and  Unger,  1991).  Conventional  wide-pore  packings 
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D)  Stagnant  mobile  phase  mass  transfer 


E)  Stationary  phase  mass  transfer 


Figure  2.  Physical  factors  which  contribute  to  peak  broadening  in  porous  HPLC  packings.  (A)  Longitudinal 
diffusion  -  results  from  normal  diffusion  of  molecules  in  liquid  medium  and  is  more  pronounced  in  slow  moving 
fluid  flow.  (B)  Eddy  diffusion  -  results  from  multiple  differentially  distanced  flowpaths  between  particles  in  a 
column.  (C)  Mobile-phase  mass  transfer  -  results  from  differing  flow  rates  for  different  parts  of  a  single  flow 
stream  between  particles  in  a  column.  (D)  Stagnant  mobile-phase  mass  transfer  -  results  from  stagnant  or 
unmoving  mobile  phase  within  the  pores  of  a  particle.  Diffusion  resulting  from  this  is  thought  to  be  the  major 
contributor  to  band  broadening.  (E)  Stationary-phase  mass  transfer  -  results  from  molecules  penetrating  the 
stationary  phase  covering  the  surface  of  the  particle  by  diffusion  to  varying  extents.  Adapted  from  Snyder  and 
Kirkland,  1979. 


(300  A),  originally  designed  for  protein  and  peptide  separations  exhibit  surface  areas  of 
50-100  mVg  and  protein  loads  of  -  35  mg/g.  Macroporous  packings  (>  8000  A)  such  as  the 
POROS  R  series  exhibit  low  protein  binding  capacity  (i.e.  -  5  mg/g)  due  to  a  smaller  surface 
area.  Later  attempts  to  increase  the  binding  capacity  by  the  re-introduction  of  short-path- 
length  pores  led  to  a  modest  (2-fold)  increase  in  the  total  protein  binding  capacity  (~  10 
mg/g,  data  obtained  from  the  manufacturer).  With  the  exception  of  chromatographic  tech¬ 
niques  such  as  displacement  chromatography,  to  achieve  efficient  chromatographic  separa¬ 
tion  of  proteins  and  peptides  on  reversed-phase  packings,  sample  loads  of  <  5%  of  the  total 
capacity  are  typically  used.  Under  these  conditions,  deleterious  slow  mass  transfer  kinetics, 
due  to  extended-pathlength  diffusion  into  stagnant  pools  of  mobile  phase,  become  less 
pronounced;  with  higher  sample  loads  deleterious  peak  shape  can  result  from  column 
overloading  (Snyder  and  Kirkland,  1973) 

Another  aspect  of  rapid  chromatography  that  warrants  careful  consideration  is  the 
instruments  liquid  pumping  and  data  collection  capabilities.  For  rapid  microbore  RP-HPLC, 
a  binary  pumping  system  capable  of  producing  precise  gradient  formation  at  low  flow  rates 
with  minimal  system  dead  volume  must  be  used.  To  achieve  similar  chromatographic 
efficiency  within  a  reduced  time  frame,  linear  flow  velocities  are  increased  whilst  maintain¬ 
ing  the  same  gradient  volume  as  formed  at  lower  linear  flow  velocities.  Pumping  systems 
that  are  unable  to  produce  precise  gradient  formation  at  low  linear  flow  velocities  and  which 
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incorporate  large  system  dead  volumes  will  not  perform  less  efficiently  at  the  higher  linear 
flow  velocities. 

With  respect  to  UV  detection,  to  obtain  a  true  representation  of  the  chromatographic 
separation,  the  collected  data  must  not  be  compromised  by  an  erroneous  data  set.  As  linear 
flow  velocities  are  increased  whilst  maintaining  gradient  volumes,  proteins  and  peptides  will 
elute  in  the  same  solvent  fraction  volume  of  the  organic  modifier  as  in  slow  linear  velocities. 
This  results  in  the  analytes  passing  through  the  detector  far  more  rapidly.  If  the  operating 
parameters  of  the  UV  detector  are  similar  to  those  used  at  conventional  low  linear  flow 
velocities  then  there  is  an  increased  risk  of  false  chromatographic  separation  representations. 
To  compensate  for  this,  data  collection  rates  must  be  increased,  (e.g.,  ~  100  msec  for  linear 
flow  velocities  >3500  cm/h),  accordingly. 

Effect  of  Linear  Flow  Velocity  on  Resolution  and  Recovery  of  Proteins 
and  Peptides 

Chromatographic  separation  of  a  mixture  of  six  proteins  at  varying  flow  rates  on 
conventional  and  macroporous  packings  is  shown  in  Fig.  3.  It  would  appear  that  the 
resolution  of  these  standard  proteins  varies  little  over  the  range  of  flow  velocities  examined. 
However,  upon  close  inspection  of  the  conventional  reversed-phase  packing  (compare  Fig. 
3  A  and  E),  there  is  a  discernible  loss  of  resolution  upon  increasing  the  flow  rate  from  0.1  to 
2.0  ml/min  (i.e.,  173  to  3465  cm/h).  However,  a  loss  of  resolution  is  also  evident  for  the 
macroporous  packing,  but  to  a  lesser  extent  (compare  Fig.  3F  and  J).  It  should  be  noted  that 
the  seemingly  lower  recoveries  at  the  higher  flow  velocities  are  due  to  peak  broadening. 
Sample  recoveries  for  a  glycoprotein  (ribonuclease-b)  by  both  stationary  phases  examined 
in  this  study  are  shown  in  Fig.  4.  Good  recoveries  from  the  alkyl  silica  of 98%  per  iterative 
step  with  an  overall  recovery  of  ~  96%  after  two  reinjections  is  shown  in  Fig  4A.  For 
perfusive  stationary  phases  however,  lower  recoveries  of  94%  per  iterative  reinjection  and 
overall  recoveries  of  86%  after  two  re-injections  are  found  (Fig.  4B).  For  multidimensional 


Injection  Number 

Figure  3,  Rapid  reversed-phase  chromatography  of  standard  proteins.  Supports:  RP-300  (panels  A  -  E);  Poros 
RII/H  (panels  F  -  J).  Chromatographic  conditions:  linear  6-ml  gradient  of0-100%B.  Solvent  A:  aqueous  0.1% 
TFA,  Solvent  B:  aqueous  0.1%  TFA  containing  60  %  acetonitrile.  Temperature:  45-C.  Chromatographic  runs 
performed  at  superficial  linear  flow  velocities  of  173  cm/h  (0. 1  ml/min)  (A,  F),  347  cm/h  (0.2  ml/min)  (B,  G), 
866  cm/h  (0.5  ml/min)  (C,  H),  1732  cm/h  (1.0  ml/min)  (D,  I)  and  3465  cm/h  (2.0  ml/min)  (E,  J).  Proteins  (5 
pg):  1,  ribonuclease-B;  2,  chick  lysozyme;  3,  bovine  serum  albumin;  4,  myoglobin;  5,  carbonic  anhydrase;  6, 
ovalbumin. 
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Figure  4.  Protein  recovery  by  rapid  reversed-phase  liquid  chromatography.  Supports:  RP-300  (panel  A); 
Poros  RII/H  (panel  B).  Chromatographic  conditions  are  as  described  in  Fig.  3.  Sample:  5  pg  Ribonuclease-B 
initially  injected  onto  the  respective  columns.  Once  eluted  the  sample  was  collected  into  a  1 .5ml  polypropylene 
tube.  The  column  was  then  re-equilibrated  with  at  least  20  column  volumes  of  Buffer  A,  and  the  collected  peak 
diluted  1 : 1  with  Buffer  A  and  reapplied.  Recovery  measurements  were  calculated  from  peak  heights.  Ail 
experiments  were  performed  in  duplicate.  Flow  rate  symbols:  (♦) ;  173  cm/h,  0.1  ml/min,  (A)  866  cm/h,  0.5 
ml/min  (•)  1732  cm/h,  1.0  ml/min  (■)  3465  cm/h,  2.0  ml/min. 


purification  strategies,  low  recoveries  of  protein  obtained  from  perfusive  packings  would  be 
of  some  concern. 

The  effect  of  linear  flow  velocity  on  the  chromatographic  separation  of  a  tryptic 
digest  of  cytochrome-c  on  conventional  and  macroporous  reversed-phase  packing  are 
compared  in  Fig.  5,  panels  A-J.  It  is  apparent  from  the  profiles  at  0.1  ml/min  that  the 
chromatographic  efficiency  of  the  conventional  silica-based  packing  exceeds  that  of  the 
macroporous  packing.  It  is  also  evident  from  the  greater  (~  30%)  “peak  capacity”  (i.e.,  the 
number  of  peaks  that  it  is  possible  to  resolve  in  a  given  chromatographic  separation)  that  the 
silica-based  packing  exhibits  a  greater  chromatographic  efficiency  than  the  macroporous 
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Figure  5.  Rapid  reversed-phase  HPLC  peptide  mapping.  Sample:  20  fig  tryptic  digest  of  cytochrome-c. 
Columns:  RP-300  2. 1mm  ID  cartridge  (panels  A  -  E);  Poros  RII/H  2.1mm  ID  (panels  F  -  J).  Chromatographic 
conditions  are  as  described  in  Fig.  3. 
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Figure  6.  In-gel  versus  in-solution  tryptic  digestion  of  phosphorylase-b.  Peptide  maps  were  obtained  by  fast 
chromatography  RP-HPLC  using  a  Brownlee  RP-300  100mm  x  2.1mm  I.D.  column.  Chromatographic 
conditions:  a  linear  6  ml  gradient  from  0-100%  B;  solvent  A,  aqueous  0.1%  TFA;  solvent  B,  aqueous  0.1% 
TFA  /  60%  CH3CN.  Flow:  0.5  ml/min  (866  cm/h).  Panels  A,  B  &  C:  control  tryptic  digests  in-solution  using 
2,  5  and  10  pg  phosphorylase-b,  respectively.  Panels  D,  E  &  F:  in-situ  gel  tryptic  digests  of  2,  5  and  10  pg  of 
phosphorylase-b,  respectively.  Reproduced  with  permission  from  Moritz  et  al,  1994. 


packing.  In  contrast  to  earlier  reports  (Kassel  et  al.,  1994),  the  chromatographic  efficiency 
of  the  silica-based  support  exceeds  that  of  the  macroporous  support  at  high  flow  rates  (e.g. 
3500  cm/h,  compare  Fig.  5E  and  J).  The  selectivity  differences  between  the  two  packings 
used  in  this  study  (compare  peaks  1  and  2  in  Fig.  5  A  and  F)  indicate  that  they  could  be  used 
in  series  in  a  multidimensional  peptide  purification  strategy. 

Rapid  Peptide-Mapping  of  Acrylamide  Gel-Resolved  Proteins 

Several  internal  amino  acid  sequencing  strategies  for  electrophoretically  separated 
proteins  have  been  developed  over  the  past  few  years  (see  Ward  et  al,  1 990a,  Rosenfeld  et 
al,  1992;  Patterson,  1994  and  references  therein).  An  excellent  practical  assessment  of  these 
methods  by  the  Association  of  Biomedical  Resource  Facilities  (ABRF)  was  published  in 
1992  (Stone,  1992)  and  1993  (Williams  etal.,  1993).  In  an  earlier  report  (Ward  et  al.,  1990a) 
we  described  our  in-gel  digestion  strategy  which  relies  on  first  removing  SDS  from  the 
CBR-250  stained  gel  prior  to  in-gel  enzymatic  digestion  and  an  extensive  acid  extraction  of 
generated  peptides.  In  an  effort  to  further  reduce  the  overall  time  of  the  procedure,  we  omitted 
some  of  the  TFA  extraction  steps  (see  Materials  and  Methods)  without  compromising  the 
overall  yield  of  recovered  peptides.  Additionally,  we  have  replaced  the  initial  SDS  removal 
step  with  a  dilute  1%  ammonuim  bicarbonate  /  acetonitrile  extraction  step.  Using  varying 
amounts  of  a  tryptic  digest  of  phosphorylase-/?  (M^  97000)  and  pre-cast  10%  gels  (Novex), 
we  compared  peptide  recoveries  from  in-gel  derived  peptide  maps  with  those  obtained 
in-solution  (Fig.  6A-F).  It  can  be  seen  that  the  peptide  map  profiles  of  the  in-gel  and  solution 
digests  compare  favourably,  even  with  20  pmol  (2  pg)  amounts  of  protein.  The  recovery  of 
peptides  from  the  in-gel  proteolysis,  based  upon  absorbance  at  214  nm,  is  ~  80%  compared 
to  the  control  in-solution  digests.  Comparable  data  was  obtained  using  standard  proteins  of 
lower  Mj.  such  as  lysozyme  and  p-lactoglobulin  (data  not  shown).  In  an  effort  to  minimize 
possible  interference  by  detergent  with  electrospray  ionization,  we  evaluated  peptide  recov¬ 
eries  in  the  absence  of  any  added  Tween  20.  The  data  shown  in  Fig.  7A-F  reveals  that 
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Figure  7.  In-gel  tryptic  digestion  of  phosphorylase-b,  effect  of  Tween-20.  Peptide  maps  were  obtained  by  fast 
chromatography  RP-HPLC  using  a  Brownlee  RP-300  100mm  x  2.1mm  I.D.  column.  Chromatographic 
conditions:  a  linear  6  ml  gradient  from  0-100%  B;  solvent  A,  aqueous  0.1%  TFA;  solvent  B,  aqueous  0.1% 
TFA  /  60%  CH3CN.  Flow:  0.5  ml/min  (866  cm/h).  Panels  A,  B  &  C:  in-situ  gel  tryptic  digests  containing 
0.02%  Tween-20  during  digestion  and  extraction  of  10,  5  and  2  pg  of  phosphorylase-b,  respectively.  Panels 
D,  E  &  F:  control  in-situ  gel  tryptic  digests  using  10,  5  and  2  pg  phosphorylase-b,  respectively. 


omission  of  Tween  does  not  seriously  affect  peptide  recoveries  even  at  low  quantities  of 
protein  (Fig.  7F). 

Examples  of  the  Application  of  Rapid  Peptide-Mapping  of 
2-DE-Resolved  Proteins 

Over  the  past  few  years,  we  have  been  identifying  2-DE  separated  proteins  from 
various  human  colorectal  cancer  cell  lines  by  sequence  and  mass  analysis  as  part  of  an 
ongoing  program  directed  towards  identifying  specific  colon  tumour  markers  (Ward  et 
ah,  1990c;  Ji  et  ah,  1993,  1994).  Examples  of  the  power  of  this  rapid  peptide  mapping 
approach  are  given  in  Figs.  8  and  9)  for  proteins  #1  and  #4  isolated  from  the  colorectal 
cell  line  LIM  1 863.  Four  CBR-250-stained  protein  spots  from  identical  gels  were  digested 
with  trypsin  and  the  digest  mixtures  were  chromatographed  at  0.5  ml/min  on  a  conven¬ 
tional  2.1  mm  ID  reversed-phase  cartridge  using  a  6.0-ml  linear  gradient  of  acetonitrile 
in  0.1%  TFA.  In  the  case  of  protein  spot  #1,  the  partial  sequence  data  obtained  (Fig,  8), 
at  48-55  pmol  levels,  was  used  to  search  the  available  protein  sequence  databases  and 
permit  the  unambiguous  identification  of  this  protein  as  thioredoxin.  For  protein  spot  #4 
(Fig.  9),  peptide  T4  was  sequenced  directly  while  the  peptide  fraction  containing  peptides 
Tl-3  was  further  resolved  by  rapid  second  dimensional  chromatography  on  the  same 
column,  but  utilising  a  modified  mobile  phase  of  1%  NaCl  /  50%  acetonitrile  (Fig.  9B), 
prior  to  subjecting  the  component  peptides  to  sequence  analysis.  The  partial  sequence 
data  obtained  (data  not  shown),  at  5-17  pmol  levels,  was  used  to  identify  this  protein  as 
heat  shock  protein  60  (HSP-60). 


SUMMARY 

This  report  describes  a  rapid  10  min)  chromatographic  approach  for  separating 
proteins  and  peptides  on  conventional  silica-based  reversed-phase  packings  employing  a 
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Sequence  Data 


Peptide  #1 

VGEFSGANK 

pmol 

55  38  54  55  8  42  49  45  24 

Repetative  Yield 

98.7  %  (HP-G1005A) 

1  5 

Peptide  #2 

F  H  S  L  S  E  K 

pmol 

48  43  6  31  6  27  13 

Repetative  Yield 

88.1  %  (HP-G1005A) 

Retention  Time  (min) 

Figure  8.  Rapid  peptide  mapping  of  colorectal  cancer  cell  line  LIM  1863  protein  #1.  Coomassie  blue  stained 
protein  #1  from  4  identical  2-D  gels  was  digested  in-gel  with  trypsin,  as  described  in  Materials  and  Methods, 
and  chromatographed  on  a  conventional  silica-based  support  (Brownlee  RP-300)  as  described  in  Fig.3.  First 
chromatographic  dimension  (Panel  A):  linear  6  ml  gradient  0-100%  B;  solvent  A,  aqueous  0.1%  TFA;  solvent 
B,  aqueous  0.1%  TFA  /  60%  CH3CN,  Flow,  0.5  ml/min  (866  cm/h).  Sequence  information  obtained  from  T1 
and  T2  are  shown.  Protein  identified  as  Thioredoxin. 


Standard  liquid  chromatograph.  An  improved  in-gel  enzymatic  digestion  strategy  is  de¬ 
scribed.  Examples  are  given  for  peptide  maps  of  phosphorylase-b  from  a  1  -D  gel  and  2-DE 
protein  spots  from  colorectal  cancer  cell  line  LIM  1 863.  In  conjunction  with  microsequenc¬ 
ing  and  mass  spectrometric  peptide-mass  fingerprinting  technologies,  this  approach  may 
facilitate  a  rapid  expansion  of  2-DE  gel  protein  databases. 
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Figure  9.  Rapid  peptide  mapping  of  colorectal  cancer  cell  line  LIM  1863  protein  #4.  Coomassie  blue  stained 
protein  #4  from  4  identical  2-D  gels  was  digested  in-gel  with  trypsin,  as  described  in  Materials  and  Methods, 
and  chromatographed  on  a  conventional  silica-based  support  (Brownlee  RP-300)  as  described  in  Fig.3.  First 
chromatographic  dimension  (Panel  A):  linear  6  ml  gradient  0-100%  B;  solvent  A,  aqueous  0.1%  TFA;  solvent 
B,  aqueous  0.1%  TFA  /  60%  CH3CN,  Flow,  0.5  ml/min  (866  cm/h).  (B)  Second  chromatographic  dimension 
(Panel  B);  peptide  fraction  containing  peptides  Tl,  T2  &  T3  (see  collection  bar)  from  Fig.TA  were  rechroma¬ 
tographed  on  the  same  column  but  using  a  linear  5  ml  gradient  from  0-50%  B;  solvent  A  was  aqueous  1% 
NaCl,  pH  6.5  and  solvent  B  was  acetonitrile.  Flow  rate,  0.5  ml/min  (866  cm/h).  protein  identified  as  heat-shock 
protein  (HSP-60,  data  not  shown).  Reproduced  with  permission  from  Moritz  et  al.,  1994. 
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INTRODUCTION 

Sodium  dodecyl  sulphate  polyacrylamide  gel  electrophoresis  (Laemmli  1970)  (SDS 
PAGE)  is  still  the  most  powerful  method  of  resolving  a  complex  mixture  of  proteins. 
However,  it  is  generally  used  as  an  analytical  rather  than  a  preparative  tool.  With  the  advent 
of  PVDF  membranes  which  are  stable  under  the  conditions  employed  by  the  Edman 
degradation,  it  has  become  common  to  try  and  obtain  N-terminal  sequence  data  from  proteins 
separated  by  SDS  PAGE  followed  by  electrophoretic  transfer  to  a  PVDF  membrane  (Mat- 
sudaira  1987).  It  is  also  possible  to  electroelute  proteins  out  of  gel  slices  for  sequencing  or 
enzymic  digestion.  Alternatively  the  proteins  can  be  enzymically  digested  within  the  gel 
matrix  or  directly  on  the  membrane  after  transfer  and  the  peptides  eluted  for  subsequent 
HPLC  purification  (Bauw  1989).  Both  these  procedures  require  the  proteins  to  be  stained 
after  electrophoresis  in  order  to  locate  their  position  in  the  gel  or  on  the  membrane.  The 
staining  process  generally  fixes  the  proteins  and  leads  to  significant  loss  of  material. 

Some  of  these  problems  may  be  overcome  by  pre- electrophoretic  labelling  (Kraft 
1988).  A  simple  method  of  pre-labelling  proteins  with  a  water  soluble  Edman  reagent 
S-DABITC  (see  figure  1)  (Chang  1989)  which  couples  to  the  N-terminal  amino  acid  and  the 
epsilon  amino  group  of  lysine  has  been  developed.  The  reaction  takes  place  under  very  mild 
conditions  and  the  reagent  has  been  described  for  its  use  in  the  identification  of  reactive 
lysines  on  the  surface  protein  molecules  (Chang  1992).  By  denaturing  proteins  in  the 
presence  of  SDS  it  is  possible  to  label  most  of  the  available  sites  on  a  molecule.  The  method 
provides  a  simple  method  of  generating  coloured  marker  proteins  for  electrophoresis  which 
can  be  used  in  preparative  electrophoresis  apparatus  or  on  SDS  PAGE  gels.  More  importantly 
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Figure  1.  The  structure  of  4-N,  N-diinethylaminobenzene-4'-isothiocyanate-2'-sulphonic  acid,  S-DABITC. 


the  labelled  proteins  can  still  be  sequenced  after  the  labelling  procedure  and  electrophoretic 
separation.  The  N-terminal  label  is  removed  during  the  first  cycle  of  Edman  degradation. 
The  labelled  molecules  can  either  be  transferred  to  a  suitable  membrane  for  direct  sequencing 
or  passively  eluted  from  the  gel,  since  no  fixing  or  further  staining  is  required  and  collected 
on  a  Prospin  cartridge  or  similar  device.  Passive  elution  is  especially  useful  for  high 
molecular  proteins  where  it  is  often  necessary  to  collect  material  from  several  gels  to  obtain 
enough  for  sequencing.  The  fact  that  the  proteins  carry  a  coloured  label  makes  it  easier  to 
keep  track  of  them.  Further,  the  label  does  not  interfere  with  enzymic  or  chemical  digestion 
and  lysine  containing  peptides  are  readily  identified  during  HPLC  separation  because  they 
have  a  characteristic  absorption  at  450nm.  The  procedure  has  been  tested  on  several  proteins 
and  found  to  be  a  practical  Method  of  labelling  and  recovering  proteins  and  peptides  for 
sequencing. 


MATERIALS  AND  METHODS 

B-lactoglobulin,  Problott  and  Prospin  devices  were  from  Applied  Biosystems,  bovine 
serum  albumin  (BSA),  ribonuclease  and  carbonic  anhydrase  were  from  Sigma.  S-DABITC 
was  from  Protein  Institute,  P.O.  Box  550,  Broomall  PA,  19008-0550  U.S.A.  The  reaction 
buffer  for  labelling  reactions  was  20mM  sodium  bicarbonate  pH  8.3,  with  or  without 
0.1%SDS.  Proteins  were  labelled  for  30  minutes  at  60°C.  Electrophoretic  transfer  to  Problot 
was  performed  in  a  Biorad  mini-gel  blotting  apparatus  using  either  25mM  tris  glycine  pH 
8.5  or  lOmM  CAPS  buffer  pH  11,  both  buffers  containing  10%  methanol.  Proteins  were 
transferred  at  1 00  volts  constant  voltage.  Cyanogen  bromide  (CnBr)  cleavage  was  performed 
by  incubating  the  protein  with  lOOpl  of  a  saturated  solution  of  CnBr  in  70%  formic  acid, 
overnight,  in  the  dark  and  at  room  temperature.  Protein  sequencing  was  performed  on  an 
Applied  Biosystems  476a  protein  sequencer  using  the  FSTBLT  cycle.  The  instrument  was 
equipped  with  a  model  610a  Data  collection  system. 


RESULTS  AND  DISCUSSION 

Presented  here  is  a  simple  method  of  labelling  proteins  with  a  coloured  water  soluble 
Edman  type  reagent.  The  labelled  proteins  can  be  used  as  coloured  markers  during  SDS 
PAGE  and  show  a  small  increase  in  apparent  molecular  weight,  probably  due  to  the  increased 
mass  contributed  by  the  label.  Figure  2  shows  the  behaviour  of  several  S-DABITC  labeled 
proteins  on  SDS  PAGE  compared  to  coomassie  blue  stained  marker  proteins.  The  labelled 
proteins  are  yellow  in  colour  at  pH  values  greater  than  7,  but  will  appear  red  if  subjected  to 
acid  conditions. 

A  major  advantage  of  labelling  proteins  with  S-DABITC  is  that  they  can  still  be 
sequenced  after  electroblotting  and  there  is  no  requirement  to  stain  the  blot  to  locate  the 
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Figure  2.  SDS  PAGE  showing  indi¬ 
vidual  labelled  proteins  and  coo- 
massie  blue  stained  marker  proteins. 
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proteins.  This  is  illustrated  in  figure  3  where  the  degradation  of  SOpmol  of  elctroblotted 
S-DABITC  B-lactoglobulin  is  shown.  The  N-terminal  leucine  is  not  visible,  since  the  HPLC 
conditions  are  not  designed  for  the  detection  of  the  S-DABPth  derivative.  However,  there  is 
no  Pth-Leucine  visible,  showing  that  the  N-terminus  of  the  protein  was  completely  labelled. 
There  is  a  small  amount  of  Pth-Isoleucine  (the  second  amino  acid  in  the  B-lactoglobulin 
sequence)  present  in  cycle  1  which  is  probably  due  to  the  partial  removal  of  the  N-terminal 


Figure  3.  Edman  degradation  of  50pmol  of  13-lactoglobulin  labelled  with  S-DABITC  before  SDS  PAGE  and 
transfer  to  Problot. 
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Figure  4.  Edman  degradation  of  50pmol  of  6-lactoglobulin  labelled  with  S-DABITC  before  SDS  PAGE  and 
transfer  to  Problot.  Prior  to  loading  into  the  sequencer  the  blot  was  treated  with  20  fil  of  TFA,  dried  in  vacuo 
and  washed  with  ethyl  acetate  in  order  to  remove  the  derivatised  N-terminal  amino  acid. 


leucine  during  the  Fstbgn  cycle  of  the  sequencer  run.  Pretreatment  of  the  blotted  sample  with 
TFA  prior  to  loading  in  the  sequencer  overcomes  this  preview  problem  and  as  shown  in 
Figure  4  only  the  second  residue  of  0-lactoglobulin,  isoleucine  is  visible  on  the  chromato¬ 
gram  again  indicating  the  labelling  of  the  N-terminus  was  complete.  Alternatively  the  fstbgn 
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Figure  5.  Edman  degradation  of  50pmol  of  B-lactoglobulin  labelled  with  S-DABITC  before  SDS  PAGE.  The 
protein  was  passively  eluted  into  water  and  then  captured  on  an  Applied  Biosystems  Prospin  device. 


cycle  could  be  modified.  Cycle  7  in  figure  2  shows  the  presence  of  a  small  amount  of  Pth 
lysine,  which  indicates  the  epsilon  amino  group  of  this  residue  was  not  completely  deriva- 
tised  with  S-DABITC  in  this  sample. 

A  further  advantage  of  pre- electrophoretic  labelling  is  that  there  is  no  requirement  to 
stain  the  gels  to  visualise  the  proteins,  therefore  the  proteins  are  not  contaminated  by  the 
staining  procedures  nor  are  they  fixed  in  any  way  and  can  be  passively  eluted  from  the 
excised  bands  by  placing  them  in  water.  Figure  5  shows  the  results  of  sequencing  passively 
eluted  S-DABITC  labelled  B  lactoglobulin.  After  elution  the  protein  was  captured  on  a 
prospin  cartridge  for  sequencing.Prior  to  sequencing  the  membrane  was  treated  with  TFA  to 
cleave  off  the  N-terminal  leucine.  There  is  a  small  amount  of  Pth  leucine  present  in  cycle 
one,  indicating  that  the  derivatisation  of  the  N-terminus  was  not  100%  in  this  case. 

In  order  to  demonstrate  that  the  label  can  be  used  to  isolate  labelled  peptides  200 
pmol  of  BSA  were  labelled,  digested  with  cyanogen  bromide,  the  fragments  separated  by 
SDS  PAGE  and  finally  electro-blotted  onto  problot.  The  result  of  the  blot  is  shown  in  figure 
6.  A  parallel  reaction  was  set  up  with  unlabelled  BSA  and  also  separated  and  transferred  to 
problott  from  the  same  gel.  The  S-DABITC  labelled  blot  and  the  ponceau  S  stained 
unlabelled  blot  are  shown  in  figure  6,  and  demonstrate  similar  senstivity.  The  results  of 


44 


K.  Ashman 


S-Dabitc  Ponceau  S 


Figure  6.  The  SDS  PAGE  separation  of  the  cyanogen  fragments  generated  from 
BSA  labelled  with  S-DABITC  prior  to  digestion.  Shown  are  the  fragments  after 
electrophoretic  transfer  to  problot. 


Figure  7.  Edman  degradation  of  a  cyangen  bromide  fragment  of  BSA.  The  BSA  was  labelled  with  S-DABITC 
prior  to  digestion.  The  fragments  were  separated  by  SDS  PAGE  and  electrophoretically  transferred  to  Problot 
for  sequencing. 
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Figure  8.  SDS  PAGE  showing  the  S-DABITC  labelling  of  a  cell  lysate  of  Osteragia  circumcincta. 


Table  1.  Some  uses  of  pre-electrophoretic  labelling 

•  To  make  PAGE  Markers  visible  during  electrophoresis 

•  Coloured  proteins  are  easy  to  track 

•  Edman  degradation  not  hindered 

•  No  additional  staining  or  fixing  allows  easy  passive  elution 

•  Tracking  proteins  for  preparative  electrophoresis  eg  HPEC,  Prep  Cell 

•  Isolation  of  labelled  peptides  by  HPLC 


sequencing  of  one  of  the  bands  on  the  S-DABITC  labelled  blot  are  shown  in  figure  7.  Two 
sequences  AD  and  RE,  corresponding  to  the  N-terminus  of  2  BSA  CNBR  fragments  are 
present. 

The  ability  of  S-DABITC  to  label  a  complex  was  tested  by  reacting  a  cell  lysate  of 
Osteragia  circumcincta  with  the  reagent.  The  labelled  protein  mixture  was  separated  by  SDS 
PAGE  and  the  result  is  shown  in  figure  8.  There  are  many  labelled  bands  visible  clearly 
showing  that  it  is  possible  to  label  complex  mixtures. 

The  results  presented  here  demonstrate  that  it  is  possible  to  efficiently  label  proteins 
with  a  water  soluble  Edman  reagent  S-DABITC.  The  labelled  molecules  are  visible  during 
SDS  PAGE  separation  and  can  be  sequenced  either  after  electro-blotting  or  passive  elution 
from  the  gel.  The  latter  method  may  be  useful  for  accumulating  and  concentrating  quantities 
of  protein  for  sequencing  or  digestion,  especially  in  the  case  of  high  molecular  weight 
proteins  where  electro-transfer  is  often  difficult.  The  label  is  stable  to  cyanogen  bromide 
digestion  and  the  labelled  peptides  can  be  isolated.  The  potential  of  the  reagent  as  a  tool  in 
protein  sequence  analysis  is  clearly  great,  it  may  be  a  useful  alternative  to  PITC  for 
sequencing  and  work  is  in  progress  to  assess  this  possibility. 
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ABSTRACT 

Capillary  electrophoresis  and  laser-based  photothermal  detection  are  used  to  analyze 
minute  amounts  of  PTH  amino  acids.  This  technology  is  demonstrated  for  analysis  of  manual 
Edman  degradation  reactions.  This  technology  is  also  used  to  analyze  the  products  generated 
by  a  highly  miniaturized  automated  protein  sequencer. 


INTRODUCTION 

The  determination  of  the  primary  amino  acid  sequence  of  minute  amounts  of  proteins 
remains  important  in  biology.  Current  technology  relies  on  the  repetitive  application  of  the 
Edman  degradation  reaction  (1).  In  this  reaction,  the  N-terminal  amino  group  of  the  peptide 
reacts  with  phenylisothiocyanate  (PITC)  under  basic  conditions  to  form  the  phenylthiocar- 
bamyl  (PTC)  derivative.  After  excess  reagent  is  extracted,  the  PTC-  peptide  is  treated  with 
anhydrous  acid  to  cleave  the  cyclic  phenylthiazolinone  amino  acid.  In  the  process,  the 
peptide  is  truncated  by  one  amino  acid  residue.  Last,  the  thiazolinone  is  extracted  from  the 
truncated  peptide  and  treated  with  aqueous  acid  to  produce  the  stable  phenylthiohydantoin 
amino  acid  (PTH).  There  are  also  two  common  side  products  produced  in  the  sequencing 
reaction,  diphenylthiourea  (DPTU)  and  dimethylphenylthiourea  (DMPTU).  Cysteine  does 
not  survive  the  Edman  degradation  reaction.  As  a  result,  there  are  1 9  possible  PTH  amino 
acid  products  for  unmodified  amino  acids,  plus  the  two  main  interfering  products. 

Manual  protein  sequencing  involves  a  laborious  series  of  reactions  and  extractions 
to  isolate  the  PTH  amino  acid  products  from  each  cycle  of  the  Edman  degradation  reaction. 
The  development  of  automated  protein  sequencers  lead  not  only  to  a  significant  increase  in 
efficiency  and  reproducibility  but  also  allowed  the  use  of  smaller  amounts  of  reagents,  which 
allows  the  study  of  smaller  amounts  of  peptide  (2-4).  Miniaturization  of  the  automated 
sequencer  has  lead  to  improved  sequencing  sensitivity.  This  approach  has  been  quite 
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successful  for  one  fundamental  reason.  The  major  impediment  to  sequence  determination  is 
reagent  impurity  and  contamination.  By  reducing  consumption  of  reagents,  contamination 
is  reduced;  smaller  amounts  of  protein  may  be  sequenced. 

In  all  examples  of  protein  sequencing,  either  thin  layer  chromatography  or  high 
performance  liquid  chromatography  have  been  used  to  identify  the  PTH  amino  acid  products. 
In  general,  detection  of  less  than  one  picomole  of  PTH  amino  acid  is  difficult.  While  the  use 
of  microbore  columns  may  offer  some  sensitivity  advantages,  the  limits  of  liquid  chroma¬ 
tography  appear  to  be  in  sight. 

We  have  reported  an  alternative  technology  for  detection  of  minute  amounts  of  PTH 
amino  acids.  This  technology  is  based  on  micellar  capillary  electrophoresis  for  separation 
and  a  laser-based  thermo-optical  absorbance  technique  for  detection  (5).  Micellar  capillary 
electrophoresis  relies  on  addition  of  surfactant  to  the  separation  buffer  in  zone  electropho¬ 
resis.  The  technique  has  been  used  to  separate  22  PTH  amino  acids  in  28  minutes  (6). 

Recently,  we  have  studied  the  effect  of  SDS  concentration,  buffer  concentration  and 
pH  on  the  separation  of  a  mixture  of  nineteen  PTH  amino  acids  (PTH-cysteine  was  excluded) 
and  two  common  by-products  formed  during  Edman  degradation:  diphenylthiourea  (DPTU) 
and  dimethylphenylthiourea  (DMPTU)  (7).  Many  of  the  components  in  this  mixture  are 
sensitive  to  their  immediate  environment,  which  is  a  similar  problem  encountered  in  HPLC. 
PTH-histidine  (pKa-'6)  is  especially  sensitive  to  pH  during  the  separation .  We  have  achieved 
baseline  separation  of  the  19  PTHs  and  DPTU  and  DMPTU  within  10  minutes  with  a  pH 
6.7  buffer  consisting  of  10.7  mM  sodium  phosphate,  1 .8  mM  sodium  tetraborate  and  25  mM 
SDS,  at  ambient  temperature.  Thenno-optical  absorbance  provides  detection  limits  (3a)  for 
the  PTH  amino  acids  that  range  from  0.2  to  5  fmol  injected  onto  the  column.  This  limit  is 
almost  1 ,000  times  better  than  currently  used  methods  for  HPLC. 

While  these  separation  and  detection  capabilities  are  outstanding,  it  is  important  to 
understand  one  property  of  the  technology:  samples  must  be  injected  in  small  volumes,  on 
the  order  of  a  few  nanoliters;  injection  of  larger  volumes  leads  to  unacceptable  band-broad¬ 
ening,  which  destroys  the  separation.  As  a  result,  there  is  a  fundamental  mismatch  between 
the  volumes  produced  by  commercial  protein  sequencers  and  the  volumes  required  for 
capillary  electrophoresis. 

In  this  paper,  we  demonstrate  the  performance  of  capillary  electrophoresis  for 
analysis  of  the  products  generated  by  manual  Edman  degradation  reaction.  The  electro¬ 
phoretic  analysis  is  much  faster  and  much  easier  to  perform  than  gradient  elution  reversed 
phase  liquid  chromatography.  We  also  demonstrate  the  use  of  capillary  electrophoresis  for 
analysis  of  the  products  generated  by  a  highly  miniaturized  protein  sequencer. 


EXPERIMENTAL 

Manual  Edman  Degradation 

The  method  for  manual  protein  sequencing  is  described  in  detail  elsewhere  (7). 

Micellar  Capillary  Electrophoresis 

Determination  of  PTH  amino  acids  was  performed  using  the  CE/thermo-optical 
absorbance  instrument  described  in  detail  elsewhere  (4,  7),  with  a  few  changes:  the  pump 
laser  was  operated  at  625  Hz,  a  neutral  density  filter  (O.D.  =  0.3)  was  placed  in  the  beam 
path  to  reduce  the  beam  intensity,  the  probe  beam  intensity  was  detected  by  a  variable 
gain/variable  bandwidth  Model  2001  Front-end  Optical  Receiver  (New  Focus,  Inc.,  Moun¬ 
tain  view,  California,  USA),  and  data  were  collected  at  3  Hz  directly  from  the  lock-in 
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amplifier  to  the  PC  via  an  RS232  interface.  A  program  was  written  in  BASIC  for  data 
collection  and  display. 

Automated  Protein  Sequencing 

Sequencing  grade  12.5%  trimethylamine  in  water  and  5%  phenylisothiocyanate  in 
heptane  were  purchased  from  Applied  Biosystems.  Anhydrous  trifluoroacetic  acid,  Poly- 
brene,  toluene  (redistilled)  and  oxidized  insulin  chain  B  were  purchased  from  Sigma. 
Acetonitrile  (HPLC  grade)  was  purchased  from  BDH  Chemicals  Canada.  Argon  gas  was 
bubbled  through  the  trimethylamine  and  trifluoroacetic  acid  solutions  to  deliver  the  reagent 
in  the  vapor  form  to  the  reactor.  Acetonitrile  and  toluene  were  mixed  in  a  ratio  of  15:85.  The 
acetonitrile :toluene  mixture  and  phenylisothiocyanate  were  pumped  as  liquids  through  the 
reaction  chamber  with  argon  gas  pressure.  Insulin  chain  B  was  dissolved  in  8.3%  trimethy¬ 
lamine  in  propanol: water,  3:2  (V/V)  adjusted  to  pH  9.5  with  trifluoroacetic  acid  and  loaded 
into  the  reaction  chamber  with  a  1  -pL  syringe. 

The  reaction  chamber  was  constructed  from  fused  silica  capillaries;  the  outside  of 
the  capillaries  were  supplied  with  apolyimide  coating.  A350-pm  inner  diameter  and  500-pm 
outer  diameter  capillary  was  inserted  about  1-cm  into  a  530-pm  inner  diameter,  700-pm 
outer  diameter  and  5-cm  long  capillary;  the  two  pieces  were  epoxied  together.  A  4-mm  long 
bed  of  Porasil-T  coated  with  20%  (w/w)  Polybrene  was  placed  in  the  larger  capillary.  A  Zitex 
plug  was  inserted  at  either  end  of  the  bed  to  hold  the  Posasil  packing  in  place.  The  reaction 
chamber  was  precycled  once  using  the  reaction  protocol  described  below.  The  reaction 
chamber  was  flushed  with  argon  gas  and  890  picomoles  of  insulin  chain  B  was  loaded. 

The  reaction  chamber  was  heated  by  Peltier  devices  purchased  from  Melcor.  A 
Digi-sense  thermocouple  was  used  to  monitor  the  temperature  of  the  reaction  chamber. 

Prepurified  argon  was  passed  through  an  oxygen  trap  and  distributed  by  a  gas  manifold. 
The  argon  was  used  to  pressurize  all  reagents  to  3.7  psig.  The  reagent  bottles  were  connected 
to  a  valve  block  originally  designed  for  use  with  the  Beckman  spinning  cup  sequencer.  The 
valve  block  has  three  single  position  valves  and  one  seven  position  valve;  only  the  latter  valve 
was  used  in  this  experiment.  Reagents  were  delivered  by  opening  the  appropriate  valve  inlet 
and  allowing  the  pressurized  argon  to  push  the  reagent  to  the  valve  block.  The  liquid  reagents 
were  then  pushed  through  the  reaction  chamber  by  a  stream  of  argon. 

The  conditions  used  for  the  sequencing  reaction  are  listed  in  Table  1 . 


Table  1. 


Step 

Reagent 

Time  (s) 

Step 

Reagent 

Time  (s) 

coupling,  57  °C 

12.5%  TMA 

60 

Cleavage 

TEA 

180 

5%  PITC 

5 

Argon 

180 

12.5%  TMA 

300 

5%  PITC 

5 

Extraction 

Acetonitrile/toluene 

5 

12.5%  TMA 

300 

argon 
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The  extracted  anilinothiazolinone  was  collected  into  a  600-|il  microcentrifuge  tube 
containing  10-|al  of  25%  trifluoroacetic  acid  in  water.  The  extract  was  then  dried  in  a 
Speed-Vac.  Conversion  to  the  PTH  amino  acid  was  performed  by  dissolving  the  extract  in 
50-pL  of  25%  trifluoroacetic  acid  and  heating  at  62  °C  for  30-35  minutes.  The  product  was 
dried  in  the  Speed- Vac  and  stored  at  -20  ®C. 


RESULTS  AND  DISCUSSION 

Separation  of  Manual  Edman  Degradation  Products 

SP-5  is  pentapeptide  with  the  sequence:  NH2-arginine-lysine-glutamic  acid-valine- 
tyrosine-COOH  (NH2-R-K-E-V-Y-COOH).  Figure  1  shows  the  electropherograms  for  the 
sequence  analysis  of  865  nmol  of  SP-5.  The  standard  contains  approximately  20  fmol  for 
each  PTH  amino  acid,  DPTU  and  DMPTU.  No  PTH-cysteine  is  present  in  the  standard.  The 
electropherograms  show  good  signal-to-noise  for  the  analyte  PTH  residue  because  of  the 
very  large  amount  of  starting  material  used.  In  the  first  cycle,  a  few  unidentified  peaks  are 
seen  besides  the  DPTU  by-product  peak.  Extensive  wash  steps  apparently  completely 
remove  DMPTU.  Each  cycle  shows  a  slight  amount  of  lag,  PTH  product  from  the  previous 
cycle  present  in  the  current  cycle.  Only  cycle  2  shows  evidence  of  preview,  PTH  product 
from  the  following  cycle  present  in  the  current  cycle. 

Separation  of  Edman  Degradation  Products  from  the  Highly 
Miniaturized  Sequencer 

Figure  2  presents  a  set  of  electropherograms  generated  from  890  pmol  of  insulin 
chain  B.  Tyrosine  (Y)  was  added  to  each  electropherogram  as  an  internal  marker  of  retention 
time.  The  retention  times  were  normalized  to  the  DMPTU  and  tyrosine  peaks.  The  first  cycle 
shows  a  strong  peak  for  phenyalinine,  which  demonstrates  a  small  amount  of  lag  in 
subsequent  cycles.  The  second  cycle  shows  a  strong  peak  for  valine  (V),  which  again  shows 
lag  in  subsequent  cycles.  The  third  cycle  shows  a  medium-size  peak  for  asparagine  (N); 
although  this  peak  is  easily  identified  as  the  terminal  residue,  lag  from  previous  cycles  begin 
to  confound  the  interpretation  of  the  data.  There  are  also  two  anomalous  peaks  in  this 
electropherogram.  The  first  appears  at  about  5.3  minutes  and  appears  to  be  due  to  the  passage 
of  a  bubble  through  the  detection  chamber.  The  broad  peak  at  5.7  minutes  is  of  unknown 
origin.  By  the  fourth  cycle,  the  peak  from  glutamine  (Q)  is  present,  although  peaks  from  N 
and  V,  due  to  lag  from  previous  cycles,  dominate  the  electropherogram. 

It  is  clear  that  the  automated  sequencer  produces  very  clean  electropherograms,  with 
relatively  little  spurious  signals  from  reagent  impurities.  However,  lag  from  previous  cycles 
is  a  serious  problem  with  the  current  instrument.  We  have  investigated  a  number  of 
experimental  parameters,  and  the  lag  does  not  appear  to  be  associated  with  low  coupling 
efficiency.  Instead,  the  instrument  appears  to  suffer  from  modest  cleavage  efficiency.  The 
instrument  is  being  modified  to  improve  the  efficiency  of  the  cleavage  step. 


CONCLUSIONS 

We  have  reported  the  use  of  capillary  electrophoresis  for  identification  of  PTH  amino 
acid  residues  produced  by  both  manual  and  automated  protein  sequencing.  The  electropho¬ 
resis  system  requires  about  11  minutes  to  separate  and  identify  the  PTH  amino  acids. 
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Figure  1.  Electropherograms  for  the  manual  sequence  analysis  of  865  nmol  of  a  pentapeptide. 


Furthermore,  because  the  system  does  not  require  re-equilibration,  a  new  sample  may  be 
analyzed  immediately  after  completion  of  an  electropherogram.  The  capillary  electrophore¬ 
sis  system  is  much  faster  and  simpler  than  gradient  elution  high  performance  liquid 
chromatography. 

In  addition  to  highly  efficient  separations,  our  use  of  photothermal  absorbance 
detection  produces  high  sensitivity  analysis.  The  laser-based  detector  generates  sub-femto- 
mole  detection  limits  for  the  PTH  amino  acids.  However,  the  system  suffers  from  one 
important  limitation.  Only  a  few  nanoliters  of  analyte  may  be  injected  onto  the  capillary 
without  introducing  an  unacceptable  amount  of  band  broadening. 

We  report  the  development  of  a  highly  miniaturized  protein  sequencer,  which  is 
matched  in  volume  to  the  volume  required  by  capillary  electrophoresis.  The  highly  mini- 
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Time,  min. 

Figure  2.  Electropherograms  for  the  automated  sequence  analysis  of  890  pmol  of  insulin  chain  B.  Tyrosine  is 
added  to  each  sample  as  an  internal  standard. 


aturized  instrument  is  much  smaller  than  conventional  technology.  It  is  based  on  a  400-}im 
diameter  reaction  mat,  which  has  about  1/1000  the  cross-sectional  area  of  a  conventional 
sequencer.  This  minute  size  allows  a  three  order  of  magnitude  reduction  in  reagent  consump¬ 
tion,  with  a  concomitant  reduction  in  contamination.  Two  important  steps  remain  in  our 
instrumentation  development  program.  First,  we  must  improve  the  cleavage  step  in  the 
miniaturized  sequencer  to  reduce  the  amount  of  lag  and  to  improve  the  overall  conversion 
efficiency.  Second,  we  need  to  couple  directly  the  reaction  chamber  with  the  electrophoresis 
system.  By  achieving  these  two  goals,  we  should  be  able  to  sequence  routinely  femtomole 
amounts  of  proteins. 
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INTRODUCTION 

The  Edman  degradation  (Edman,  1949)  has  been  the  most  successful,  general  and 
widely  used  technique  for  the  determination  of  the  amino  acid  sequence  of  proteins  and 
peptides.  As  a  benefit  of  this  distinction,  over  the  last  four  decades  the  method  has  been 
refined  to  a  high  degree  of  perfection.  Nevertheless,  sequencing  with  phenyl  isothiocyanate 
(PITC)^  suffers  from  a  few  practical  limitations.  First,  the  extinction  coefficient  of  the 
phenyl thiohydantoins  (PTH’s)  limits  sequencing  sensitivity.  Currently,  routine  sequencing 
in  most  laboratories  requires  low  picomole  amounts  of  sample  applied  to  the  sequencer. 
Second,  UV-absorbing  products  which  may  co-elute  with  PTH’s  during  high  performance 
liquid  chromatography  (HPLC)  separation  have  a  tendency  to  obscure  the  specific  PTH 
signals  during  high  sensitivity  sequencing.  Third,  with  the  exception  of  select  cases  (Wet- 
tenhall  et  al,  1991;  Meyer  et  al,  1 990, 1991;  Aebersold  et  al,  1991;  Gooley  et  al,  1991;  Pisano 
et  al,  1993),  modified  and  unnatural  amino  acids  of  known  structure  are  difficult  to  identify 


*  Correspondence  address:  Department  of  Molecular  Biotechnology,  University  of  Washington,  FJ-20, 
Seattle,  WA  98195. 

^ABBREVIATIONS:  PITC:  phenyl  isothiocyanate;  PTH:  phenylthiohydantoin;  HPLC:  high-performance 
liquid  chromatography;  ESI-MS:  electrospray  ionization  mass  spectrometer/metry;  MS/MS:  tandem  mass 
spectrometer/metry;  PETMA-PITC:  3-[4Xethylene-N,N,N-trimethylamino)-phenyl]-2-isothiocyanate; 
PITC-3 1 1: 4-(3  pyridylmethylaminocarboxypropyl)-phenyl  isothiocyanate;  RP-HPLC:  reverse-phase  high- 
performance  liquid  chromatography;  TEA:  trifluoroacetic  acid;  MeCN:  acetonitrile. 
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and  de-novo  characterization  of  such  residues  by  UV  absorbance  detection  alone  is  extremely 
difficult. 

To  overcome  these  limitations  we  have  attempted  to  develop  a  new  protein  degrada¬ 
tion  chemistry.  In  particular,  the  aims  of  this  new  chemistry  were  to  achieve  higher 
sensitivity,  to  provide  enhanced  selectivity  for  detecting  the  specific  signal  in  the  products 
of  a  chemical  sequencing  cycle  and  to  provide  the  possibility  of  structural  characterization 
of  modified  residues.  To  this  end  we  endeavored  to  design  a  sequencing  reagent  which 
generated  derivatives  that  are  detectable  by  electrospray  ionization  mass  spectrometry 
(ESI-MS).  Femtomole  level  detection  sensitivity  of  ESI-MS  is  well  documented  and  mass 
analysis  of  the  cleaved  and  extracted  residues  is  expected  to  enhance  the  ability  to  identify 
modified  residues  and  to  extract  the  specific  signal  out  of  the  complex  chemical  mixture 
generated  by  the  protein  sequencer.  An  additional  intrinsic  capability  of  an  ESI-MS-based 
sequencing  chemistry  is  the  potential  for  de-novo  structure  determination  of  modified 
residues  by  analysis  of  tandem  MS  (MS/MS)  fragmentation  patterns  of  amino  acid  deriva¬ 
tives. 

As  part  of  ongoing  efforts  in  our  group  to  develop  and  improve  methods  of  protein 
structure  analysis,  we  report  the  synthesis,  evaluation  and  application  of  a  panel  of  reagents 
for  stepwise  degradation  of  polypeptides  and  analysis  of  the  resultant  derivatives  by  ESI-MS. 
We  describe  the  process  by  which  the  reagents  were  designed,  the  difficulties  that  arose  with 
specific  compounds,  and  the  evolution  toward  a  structure  that  met  the  intricate  requirements. 


A  NEW  PROTEIN  SEQUENCING  REAGENT:  EVOLUTION  OF  THE 
DESIGN 

The  first  reagent  we  synthesized  and  reported  on  was  3-[4'(ethylene-N,N,N-trimethy- 
lamino)phenyl]-2-isothiocyanate  (PETMA-PITC)  (Aebersold  et  al,  1992)  as  shown  in 
Fig.  1,  structure  1.  The  molecule  was  designed  to  include  three  specific  components,  each 
of  which  was  to  serve  a  distinct  purpose.  The  PITC  moiety  was  included  to  ensure  the  optimal 
coupling  and  cleavage  kinetics  which  distinguish  the  Edman  chemistry.  The  strongly  basic 
functional  group,  a  quaternary  amine  in  the  case  of  PETMA-PITC,  was  added  to  mediate 
efficient  ionization  for  high  sensitivity  detection  by  ESI-MS.  The  bridging  section  consisting 
of  an  ethyl  group  in  the  case  of  PETMA-PITC  was  added  to  ensure  steric  and  electronic 
separation  of  the  PITC  and  the  basic  groups.  Such  separation  was  desirable  to  minimize 
interference  between  the  two  functional  groups. 

Experiments  using  PETMA-PITC  demonstrated  that  the  resulting  amino  acid  deriva¬ 
tives  could  be  detected  at  low  femtomole  sensitivities  by  ESI-MS  and  that  the  kinetic 
properties  of  the  reagent  were  comparable  with  those  of  PITC.  However,  when  PETMA- 
PITC  was  tested  in  an  automated  protein  sequencer  we  observed  two  limitations.  First,  the 
reagent  was  too  polar  to  be  compatible  with  common  absorptive  sequencing  protocols.  This 
prevented  the  use  of  this  chemistry  in  the  majority  of  sequencers  currently  in  use  without 
significant  modifications  to  hardware  and  protocols.  Second,  we  discovered  that  it  would  be 
preferable  for  the  sequencing  reagent  to  have  a  higher  molecular  weight  to  ensure  that  the 
derivatives  would  appear  in  an  area  of  the  mass  spectrum  that  was  less  abundant  in  interfering 
background  contaminants. 

To  reduce  polarity  and  to  enhance  the  molecular  weight  the  reagent  CIO-PETMA- 
PITC  (Fig.l,  structure  2)  was  synthesized.  While  the  PITC  and  quaternary  amine  functional 
groups  were  maintained,  this  compound  differed  from  PETMA-PITC  by  a  ten-carbon  chain 
extension  which  was  attached  with  an  amide  bond  linkage.  With  this  reagent,  the  formation 
of  amino  acid  derivatives  detectable  by  ESI-MS  was  effected  without  difficulty,  however. 
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Figure  1.  Molecular  structure  of  reagents  synthesized  and  evaluated.  1:  PETMA-PITC  2:  CIO-PETMA-PITC 
(n  =  10)  3:  C5-PETMA-PITC  (n  =  5)  4:  pyridylmethylisothiocyanate  5:  nicotinic  phenylisothiocyanate  6: 
pyridyl-methylphenylisothiocyanate  7:  4-(3-pyridinylmethylaminocarboxypropyl)  phenyl  isothiocyanate 
(PITC311). 


the  products  were  found  to  chromatograph  very  poorly  under  typical  reverse-phase  high 
performance  liquid  chromatography  (RP-HPLC)  conditions.  We  attributed  this  occurrence 
to  the  formation  of  micelles,  a  structure  common  to  amphipatic  molecules.  To  reduce  the 
potential  for  micelle  formation  we  next  shortened  the  length  of  the  chain  extension  to  five 
carbons  to  form  the  reagent  C5-PETMA-PITC  (Fig.  1.,  structure  3).  This  was  achieved  by 
using  the  same  synthetic  steps  as  with  CIO-PETMA-PITC,  except  that  a  starting  material  of 
different  size  carbon  chain  was  used.  This  “cassette-style”  process  to  create  the  reagents 
expedited  synthesis  considerably.  Unfortunately,  as  was  the  case  with  CIO-PETMA-PITC, 
the  C5-PETMA-PTH’s  generated  by  sequencing  polypeptides  with  C5-PETMA-PITC  were 
difficult  to  resolve  and  recover  by  RP-HPLC.  To  overcome  the  difficulties  associated  with 
the  strongly  polar  quaternary  amine  one  of  us  (DJCP)  suggested  the  use  of  a  pyridyl  group 
as  a  mediator  of  ionization  in  ESI-MS.  In  contrast  to  quaternary  amines  the  pyridyl  group  is 
not  formally  charged  under  typical  RP-HPLC  conditions,  suggesting  that  pyridyl-based 
reagents  could  be  more  suitable  to  chromatographic  separation  and  absorptive  sequencing 
conditions.  Using  the  reagent  3-pyridylmethyl  isothiocyanate  (Fig.  1 ,  structure  4)  we  showed 
that  pyridyl-containing  amino  acid  derivatives  were  detectable  at  sensitivities  comparable 
to  those  derived  from  quaternary  amine-based  reagents.  Unfortunately  this  reagent  had  the 
disadvantage  of  a  small  molecular  weight,  thus  generating  amino  acid  derivatives  that  would 
appear  in  a  region  of  the  mass  spectrum  which  was  obscured  to  a  significant  degree  by  low 
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molecular  weight  contaminants  of  unknown  origin  and  nature.  Thus  our  proposed  course  at 
this  stage  was  to  create  a  molecule  that  would  concentrate  on  two  main  components:  i)  a 
pyridine  ring  for  E SI-MS  detection  and  reduced  polarity  compared  to  a  positively  charged 
quaternary  amine,  and  ii)  a  higher  molecular  weight  to  increase  the  organic  character  of  the 
compound  and  to  produce  derivatives  to  appear  in  a  cleaner  area  of  the  mass  spectrum. 

The  first  reagent  incorporating  these  insights  was  nicotinic  phenylisothiocyanate 
(Fig.  1,  structure  5).  This  compound  was  developed  from  the  amide  linkage  of  nicotinic  acid 
and  p-nitrophenethylamine.  While  preliminary  results  demonstrated  desirable  coupling  and 
cyclization/cleavage  kinetics  as  well  as  good  detection  sensitivity  by  ESI-MS,  application 
of  this  reagent  for  automated  sequencing  revealed  chemical  instability  during  the  sequencing 
process.  We  learned  that  the  amide  bond  tended  to  cleave  (at  roughly  50%  yield)  upon 
exposure  to  trifluoroacetic  acid  (TFA).  Adjustment  of  the  sequencing  conditions  to  limit  TFA 
exposure  and  reduce  temperature  did  not  yield  significant  improvement. 

To  arrive  at  a  more  stable  structure  while  maintaining  the  desirable  characteristics  of 
nicotinic  phenylisothiocyanate  we  synthesized  and  evaluated  4-pyridylmethyl 
phenylisothiocyanate  (Fig.  1 ,  structure  6).  Amino  acid  derivatives  prepared  with  this  reagent 
showed  favorable  chromatography  and  ESI-MS  detection  characteristics.  However,  the 
reagent  showed  poor  coupling  kinetics  in  manual  “bench-top”  coupling  reactions  as  well  as 
in  automated  peptide  sequencing.  We  attributed  this  observation  to  the  characteristic  of 
possessing  only  one  carbon  atom  in  the  spacer  group. 

At  this  point  we  decided  to  return  to  using  an  amide  bond  linking  group  similar  to 
nicotinic  phenylisothiocyanate,  but  with  two  modifications  aimed  at  arresting  the  cleavage 
problem.  The  first  change  was  to  insert  a  spacer  between  the  amide  bond  and  the  pyridine 
ring  to  attenuate  the  electron  withdrawing  effects  of  the  ring  on  the  amide  bond,  the  effect 
believed  to  be  responsible  for  weakening  the  amide  in  nicotinic  phenylisothiocyanate.  The 
second  change  was  to  reverse  the  sense  of  the  amide  bond  so  that  the  carbonyl  would  be 
even  further  removed  from  the  pyridine  ring.  With  this  rationale,  the  reagent  4-(3-pyridyl- 
methylaminocarboxypropyl)phenyl  isothiocyanate  (Fig.  1,  structure  7)  was  synthesized. 
While  the  name  4-(3  pyridylmethylaminocarboxypropyl)phenyl  isothiocyanate  is  a  chemi¬ 
cally  accurate  description  for  the  reagent  we  use  the  simpler  name  PITC  3 1 1  to  reflect  the 
molecular  weight  of  the  compound  in  the  name.  It  was  with  this  reagent  that  we  observed 
superior  results  with  respect  to  chemical  stability,  reactivity  and  chromatography  and  we 
therefore  proceeded  to  a  detailed  characterization  of  this  reagent.  The  preliminary  evaluation 
of  the  panel  of  compounds  described  above  with  respect  to  molecular  weight,  polarity, 
reaction  kinetics,  HPLC  chromatography,  chemical  stability,  and  mass  spectral  detectability 
are  summarized  in  Fig.  2. 


ANALYSIS  OF  311  PTH  AMINO  ACID  DERIVATIVES  BY  ESI-MS 

Initially  we  synthesized  thiohydantoins  of  the  20  naturally  occurring  amino  acids  and 
analyzed  the  products  by  ESI-MS  (Bures,  1994).  The  mass  spectrum  of  3 1 1  PTH  Val  shown 
in  Fig.  3  is  representative  of  a  typical  result  obtained  with  such  compounds.  The  measured 
mass  of  [M+H]**"  =  411.0  corresponded  to  the  calculated  mass  for  the  molecule  and  it  is 
apparent  that  the  molecule  displayed  only  very  limited  fragmentation  under  the  ionization 
conditions  used. 

We  next  evaluated  the  detection  sensitivity,  linearity  of  detector  response  and  the 
dynamic  range  of  the  detector.  Different  amounts  of  3 1 1  PTH’s  ranging  from  50  fmole  to  10 
pmole  were  applied  to  a  1mm  i.d.  column  and  subjected  to  LC-ESI-MS  analysis.  The  results 
for  residues  with  acidic,  unpolar  and  neutral-polar  side  chains  shown  in  Fig.  4.  demonstrate 
that  31 1  PTH’s  are  detectable  by  ESI-MS  at  a  sensitivity  below  50  femtomoles  and  that  the 
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Figure  3.  Structure  and  mass  spectrum  of  3 1 1  PTH  Val.  One  pmole  of  purified  3 1 1  PTH  Val  was  analyzed  by 
LC-MS.  The  sample  was  chromatographed  over  a  1x50  mm  Reliasil  BDS  C-18  column  at  a  flow  rate  of  50 
jil/min  using  a  TFA/acetonitrile  (MeCN)  solvent  system.  The  inset  shows  the  chemical  structure  of  the 
compound. 


detection  sensitivity  is  comparable  for  amino  acid  derivatives  with  acidic,  unpolar  and 
neutral-polar  side  chains.  This  sensitivity  level  was  comparable  to  values  achieved  pre¬ 
viously  with  PETMA-PITC  and  supported  the  potential  for  protein  sequencing  at  enhanced 
sensitivity  using  PITC  311  and  ESI-MS  detection  of  3 1 1  PTH’s.  Furthermore,  the  detector 
response  was  linear  in  the  range  of  50  fmole  to  several  pmole  and  the  dynamic  range  of 
detection  covered  three  orders  of  magnitude.  Finally,  it  is  important  to  note,  that  the 
instrumental  conditions  employed  for  these  experiments  were  such  that  they  would  emulate 
those  to  be  used  in  a  sequencing  run  in  an  automated  sequencer.  In  particular,  the  system 
was  compatible  with  the  injection  of  sample  volumes  of  up  to  100  pi  without  loss  of 
resolution  and  sensitivity  (Hess  et  al,  1994). 


AUTOMATED  SEQUENCING  WITH  PITC  311 

Given  that  the  preliminary  testing  of  PITC  311  showed  promise,  the  next  goal 
was  to  append  an  ESI-MS  to  an  automated  polypeptide  sequencer  to  attempt  “real” 
microsequencing  conditions.  The  system  that  was  employed  is  schematically  presented 
in  Fig.  5.  A  commercially  available  sequencer  (Applied  Biosystems  model  477A)  was 
interfaced  on-line  with  the  LC-ESI-MS  configuration  used  for  the  PITC  311  evaluations 
described  above. 

To  assess  the  potential  of  the  ESI-MS  system  for  detection  of  3 1 1  PTH’s  generated 
by  automated  sequencing  we  applied  an  aliquot  of  a  synthetic  peptide  to  the  cartridge  of  the 
protein  sequencer,  subjected  the  sample  to  automated  sequencing  using  PITC  311  and 
monitored  the  degradation  products  sequentially  by  UV  absorbance  detection  and  by 
ESI-MS.  The  data  shown  in  Fig.  6  compare  the  UV  absorbance  signals  in  the  first  2 
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Figure  4,  LC-ESI-MS  of  3 1 1  PTH’s.  Samples  of  the  amounts  indicated  and  the  derivatives  indicated  were 
chromatographed  over  a  1x50  mm  Reliasil  BDS  C-18  column  at  a  flow  rate  of  50  pl/min  and  analyzed  by 
multiple  ion  monitoring  LC-ESI-MS.  Samples  were  injected  with  a  50  pi  loop  and  subject  to  a  14  minute 
gradient  using  a  TFA/MeCN  solvent  system.  Integrated  peak  values  are  shown.  The  system  consisted  of  a 
Michrom  UMA  HPLC  system  (Michrom  Bioresources)  and  an  API  III  triple  quadrupole  MS  (PE/SCIEX). 


sequencing  cycles  (panels  A1 ,  A2),  the  total  ion  current  representing  all  the  ions  detected  by 
ESI-MS  in  the  mass  range  from  365-755  Da  of  the  same  cycles  (panels  B1,B2),  and  the 
enhanced  MS  signal  generated  by  selected  ion  extraction  (panels  Cl,  C2).  Comparison  of 
panels  A  and  B  in  Fig.  6  illustrates  that  most  of  the  contaminants  which  were  detected  at  a 
relatively  constant  level  by  UV  absorbance  detection  during  the  sequencing  experiment  were 
also  detected  by  the  ESI-MS.  In  general,  the  ESI-MS  results  largely  resemble  the  UV  data 
in  this  form.  Selective  monitoring  of  the  ions  corresponding  to  the  expected  3 1 1  PTH’s  and 
their  adducts  (panels  Cl,  C2)  dramatically  enhanced  signal  levels,  suggesting  that  the  use 
of  MS  detection  for  the  products  of  chemical  peptide  degradation  will  be  advantageous  for 
high  sensitivity  sequencing  experiments. 

In  an  experiment  designed  to  evaluate  the  level  of  sequencing  sensitivity  achievable 
using  ESI-MS  detection  of  311  PTH’s,  we  applied  decreasing  amounts  (calibrated  by 
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Figure  5.  Schematic  of  system  for  PITC  3 1 1  sequencing.  The  operation  of  the  sequencer,  HPLC  and  MS  was 
controlled  by  their  respective  controllers.  Synchronization  was  achieved  by  using  event  A  and  B  of  the  model 
477A  sequencer  to  start  the  HPLC  and  the  MS  respectively.  The  3 1 1  PTH’s  were  transferred  by  argon  pressure 
from  the  sequencer  through  a  standard  teflon  injector  line  into  the  injection  loop  of  the  HPLC  using  the  manual 
injection  port.  The  transfer  delay  was  approximately  8  sec.  at  which  time  the  injection  of  the  HPLC  was 
triggered  and  subsequently  the  data  acquisition  of  the  MS  was  started  by  contact  closure  signals.  A  fused  silica 
capillary  with  an  inner  diameter  of  75  pm  was  used  to  connect  the  UV-cell  of  the  HPLC  with  the  ESI-MS  ion 
source.  Liquid  connections  between  instruments  are  in  solid  lines.  Electrical  connections  are  represented  by 
broken  lines.  The  flow  splitter  between  the  HPLC  unit  and  the  electrospray  ionization  interface  was  optional 
and  did  not  affect  the  performance  of  the  system. 


Figure  6.  Signal  enhancement  by  ESI-MS  detection  of  311  PTH’s.  A  synthetic  peptide  was  subjected  to 
automated  sequence  analysis  using  PITC  311  and  the  resulting  311  PTH’s  were  monitored  sequentially  by  UV 
absorbance  and  ESI-MS.  Results  from  the  first  two  sequencing  cycles  (1,2)  are  displayed.  Row  A:  UV 
absorbance  detection  of  311  PTH’s.  Row  B:  Total  ion  current  of  311  PTH  detection.  Mass  range  displayed  is 
365-755  Da.  Row  C:  Extraction  of  acquired  MS  data  for  the  masses  corresponding  to  311  PTH’s  of  naturally 
occurring  amino  acids.  Peaks  are  designated  with  the  one  letter  code  of  the  corresponding  amino  acid  and  the 
mass  of  the  compound. 
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Figure  7.  Subpicomole  peptide  sequencing  with  PITC  311.  A  500  fmole  amount  of  a  synthetic  decapeptide 
with  the  sequence  NH2-Val-Gln-Ala-Ala-Ile-Asp-Tyr-Ile-Asn-GIy-C02H  was  subjected  to  automated  se¬ 
quence  analysis  using  PITC  311  and  the  resultant  311  PTH’s  were  monitored  by  ESI-MS  operated  in  the 
multiple  ion  monitoring  mode.  Each  panel  depicts  a  cycle-by-cycle  histogram  of  the  abundance  of  ESI-MS 
signal  corresponding  to  the  amino  acid  indicated  by  the  single  letter  code  in  the  upper  right  hand  corner  of  the 
graph.  The  darkly  shaded  bars  indicate  the  amino  acid  residue  specific  for  the  respective  sequencing  cycle. 


quantitative  amino  acid  composition  analysis)  of  a  synthetic  decapeptide  with  the  sequence 
NH2-Val-Gln-Ala-Ala-Ile-Asp-Tyr-Ile-Asn-Gly-C02H  to  the  cartridge  of  the  protein  se¬ 
quencer  and  sequenced  the  peptides  with  PITC  311  under  the  conditions  described  above. 
The  results  of  an  experiment  in  which  500  fmole  of  peptide  was  covalently  attached  to  an 
Arylamine  Immobilon  disc  (Millipore)  (Coull,  1991)  and  applied  to  the  sequencer  is  shown 
in  Fig.  7. 
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The  3 1 1  PTH  mass  provided  a  third  data  dimension  in  addition  to  the  RP-HPLC 
retention  time  and  the  signal  intensity.  To  reduce  the  complexity  we  displayed  the 
sequencing  data  as  a  series  of  two-dimensional  histograms,  each  histogram  representing 
the  abundance  of  ESI~MS  signal  corresponding  to  a  particular  amino  acid  derivative 
as  a  function  of  the  sequencing  cycle.  The  data  in  Fig.  7  illustrate  that  femtomole 
amounts  of  peptide  can  be  sequenced  by  this  system.  Whereas  the  background  signal 
for  most  311  PTH’s  is  relatively  constant  and  low  we  observed  a  contaminant  isobaric 
to  311  PTH  Tyr  which  almost  co-eluted  with  311  PTH  Tyr,  obscuring  the  specific 
signal  in  cycle  7.  We  determined  that  this  contaminant  was  associated  with  certain 
batches  of  PITC  311  and  could  be  removed  by  additional  purification  of  the  reagent. 
The  aspartic  acid  (D)  signal  in  cycle  6  was  suppressed  as  a  result  being  covalently 
attached  to  the  support  membrane  during  sequencing.  While  covalent  sample  attachment 
is  not  a  requirement  for  the  PITC  311  chemistry  it  was  advantageous  to  use  covalent 
sample  attachment  in  the  early  sequencing  experiments  prior  to  optimization  of  ab¬ 
sorptive  sequencing  protocols. 

While  these  results  are  an  encouraging  demonstration  of  the  potential  of  PITC 
3 1 1  chemistry  for  high  sensitivity  sequencing  the  use  of  a  synthetic  peptide  substrate  in 
tightly  controlled  test  experiments  precludes  an  assessment  of  the  generality  of  the 
procedure.  The  key  information  pertinent  to  scientists  that  are  working  with  proteins  is 
how  much  material  is  needed  to  obtain  N-terminal  or  internal  sequence  data.  To  address 
this  practical  issue  we  used  trypsin  to  cleave  decreasing  amounts  (calibrated  by  quanti¬ 
tative  amino  acid  composition  analysis)  of  bovine  carbonic  anhydrase  ,  separated  the 
resultant  fragments  by  microbore  RP-HPLC  and  sequenced  selected  collected  peptides 
using  PITC  311  as  described  above.  The  resultant  degradation  products  were  analyzed 
by  ESI-MS  and  the  data  are  displayed  as  described  above  for  Fig.  7.  The  data  shown  in 
Fig.  8  represent  the  sequence  of  a  peptide  obtained  by  cleavage  of  1.2  pmole  of  protein. 
Considering  the  losses  associated  with  HPLC  purification,  collection  and  transfer  of  the 
peptide  into  the  sequencer  clearly  only  femtomole  amounts  of  peptide  were  sequenced. 
The  darkly  shaded  bars  in  Fig.  8  indicate  that  the  sequence  could  be  easily  and  unambi¬ 
guously  called  even  at  that  sensitivity  level. 


SUMMARY 

We  have  synthesized  and  evaluated  a  panel  of  novel  protein  sequencing  reagents 
designed  to  yield  amino  acid  derivatives  detectable  at  the  low  femtomole  level  by  ESI-MS. 
Polypeptide  degradation  with  these  reagents  is  based  on  the  phenylisothiocyanate  function¬ 
ality  introduced  by  Edman  (Edman).  The  chemistries  were  therefore  easily  adapted  to 
automated  stepwise  degradation.  Through  a  systematic  process,  we  have  arrived  at  a  new 
reagent,  PITC  311,  that  permits  a  sequencing  approach  that  incorporates  ESI-MS  detection. 
Employing  this  approach,  we  have  shown  that  PITC  3 1 1  is  compatible  with  femtomole  level 
peptide  sequencing.  Additionally,  we  have  demonstrated  that  mass  information  provided  by 
ESI-MS  detection  enhances  confidence  level  in  data  interpretation.  Similarly,  mass  informa¬ 
tion  available  by  ESI-MS  analysis  of  311  PTH’s  assists  in  characterization  of  modified  and 
unnatural  amino  acid  residues.  In  future  work  with  this  reagent,  our  aim  is  to  optimize 
automated  sequencing  cycles  for  high  sensitivity  protein  sequencing.  We  also  endeavor  to 
develop  a  methodology  to  apply  PITC  311  for  high  sensitivity  absorptive  sequencing,  and 
to  create  rapid  sequencing  protocols. 
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Figures.  High  sensitivity  PITC  311  sequencing.  A  1.2  pmole  sample  of  bovine  carbonic  anhydrase  was 
cleaved  with  trypsin  and  the  resultant  peptide  fragments  separated  by  RP-HPLC  and  collected  manually.  One 
of  these  collected  peptides  was  subject  to  automated  sequence  analysis  using  PITC  3 1 1  and  the  resultant  3 1 1 
PTH’s  were  monitored  by  ESI-MS  operated  in  the  multiple  ion  monitoring  mode.  Each  panel  depicts  a 
cycle-by-cycle  histogram  of  the  abundance  of  ESI-MS  signal  corresponding  to  the  amino  acid  indicated  by  the 
single  letter  code  in  the  upper  left  hand  comer  of  the  graph.  The  darkly  shaded  bars  indicate  the  amino  acid 
residue  specific  for  the  respective  sequencing  cycle.  A  -S:  311  PTH  dehydro  serine. 
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SUMMARY 

The  three  major  groups  of  glycosylated  phenylthiohydantoin  (PTH)  derivatives 
Asn(Sac),  Ser(Sac)  and  Thr(Sac),  can  be  clearly  resolved  and  separated  from  the  other  20 
commonly  occurring  PTH-amino  acids  using  a  new  5  mM  triethylammonium  formate 
(TEAF)  buffer,  pH  4  0  with  an  acetonitrile  gradient.  The  glycosylated  amino  acids  elute  early 
in  a  1  -5  min  “glycosylation  window”  between  6  5-8  min,  while  all  the  other  PTH-amino 
acids  elute  between  8-15  min.  This  buffer  system  was  developed  principally  for  its  ability 
to  separate  all  PTH-amino  acids  and  glycoamino  acids  at  low  ionic  strength.  The  low  buffer 
concentration  is  necessary  to  minimize  glucose  contamination  for  monosaccharide  analysis 
of  the  PTH-glycoamino  acids. 

We  demonstrate  that:  (a)  a  TEAF  buffer  system  is  compatible  with  monosaccharide 
analysis  of  the  PTH-glycoamino  acid  and,  in  principle,  the  volatile  nature  of  the  buffer  makes 
it  suitable  for  ionspray  mass  spectrometric  analysis  of  recovered  PTH-glycoamino  acids,  (b) 
the  “glycosylation  window”  is  important  for  the  detection  of  site-specific  partial  glycosyla- 
tion  and  for  identifying  different  forms  of  PTH-glycoamino  acids. 


INTRODUCTION 

Bioactive  proteins  are  commonly  glycosylated  and  in  many  cases  the  glycosylation 
is  important  for  stability,  secretion,  biological  activity,  recognition  and  cell-cell  interactions 
(Williams  and  Barclay,  1988;  Mallett  and  Barclay,  1991).  However,  in  many  other  instances 
the  glycosylation  has  not  been  assigned  to  specific  residues  and  usually  only  the  pooled 
oligosaccharides  from  the  protein  are  characterized  (Dwek  et  aL,  1993).  The  control  of 
glycosylation  is  also  becoming  increasingly  important  in  the  biotechnology  industry  with 
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the  use  of  eukaryotic  expression  systems  where  the  rules  for  in  vivo  glycosylation,  especially 
0-linked  oligosaccharides  are  only  now  being  understood  (Gooley  and  Williams,  1994). 

The  traditional  approach  to  the  study  of  carbohydrates  attached  to  proteins  and 
peptides  has  been  to  release  them  from  the  polypeptide  chain  by  chemical  or  enzymatic 
treatment  and  then  characterize  the  oligosaccharides/monosaccharides  separately  by:  high 
performance  anion  exchange  chromatography  with  pulsed  amperometric  detection  (HPAEC- 
PAD)  (Townsend  et  al,  1989;  Townsend  and  Hardy,  1991),  fluorophore  assisted  carbohy¬ 
drate  electrophoresis  (FACE™)  (Jackson,  1991)  or  mass  spectrometry  (Carr  et  al.,  1993). 
However  glycoproteins  often  contain  a  heterogeneous  collection  of  both  A-linked  and 
0-linked  oligosaccharides  and  the  release  of  glycans  of  a  particular  class  provides  no  specific 
information  concerning  site-specific  glycosylation. 

Solid-phase  Edman  degradation  is  a  powerful  tool  in  the  identification  and  quantita¬ 
tion  of  sites  of  glycosylation  as  individual  glycoamino  acids  are  recovered  for  monosaccha¬ 
ride  analysis  and  mass  analysis  (Gooley  et  a/.,  1994a;  Gooley  et  al.,  1994b).  It  is  also  the 
only  method  available  for  the  characterization  of  clustered  sites  of  glycosylation  found  in 
glycoproteins  including,  mucins,  the  extracellular  domain  of  human  glycophorin  A  and  the 
macroglycopeptide  of  bovine  K-casein  (Pisano  et  al.,  1993;  Pisano  et  al.,  1994). 

Phenylthiohydantoin  (PTH)-amino  acids  are  easily  separated  by  reversed-phase 
HPLC  and  the  most  popular  buffer  systems  use  sodium  acetate  in  tetrahydrofuran.  This 
provides  a  compact  chromatogram  without  a  suitable  “window”  for  the  early  elution  of  polar 
modified  amino  acids  such  as  glycosylated  Asn,  Ser  and  Thr.  Two  alternative  solvent  systems 
have  been  proposed  recently.  The  first  is  a  35  mM  ammonium  acetate,  pH  4  9  /acetonitrile 
system  recommended  by  Millipore/BioSearch  for  the  MilliGen  ProSequenator™  and  used 
by  Gooley  et  al.,  (1991)  and  Pisano  et  al,  (1993)  for  the  identification  of  glycosylated 
PTH-amino  acids.  This  system  was  found  to  be  unsuitable  for  monosaccharide  analysis 
because  of  a  high  glucose  contamination  on  hydrolysis  (Gooley  et  al.,  1994a).  It  also  does 
not  allow  for  the  unambiguous  assignment  of  glycosylated  residues  because  of  insufficient 
resolution  or  the  detection  of  low  levels  of  glycosylation.  The  second  solvent  system 
(Strydom,  1994)  which  involves  a  mixture  of  triethylamine-phosphate  buffered/metha¬ 
nol/acetonitrile  as  solvent  A  and  a  mixture  of  methanol/isopropanol/water  as  solvent  B  has 
not  been  systematically  evaluated  for  extraneous  sugar  content,  nor  have  the  elution  positions 
of  glycosylated  Ser/Thr  residues  been  established. 

For  the  detection,  correct  assignment  and  characterization  of  glycosylated  PTH- 
amino  acids  by  on-line  reversed-phase  C)g  HPLC  during  routine  N-terminal  sequence 
analysis,  it  is  advantageous  to  have  a  chromatographic  system  that: 

a.  identifies  each  type  of  glycosylated  amino  acid 

b.  separates  the  PTH-glycoamino  acids  from  the  PTH-amino  acids  to  allow  the 
detection  of  partially  glycosylated  sites 

c.  separates  the  20  non-glycosylated  PTH-amino  acids 

d.  uses  a  solvent  system  which  is  low  in  glucose  contamination  to  enable  monosac¬ 
charide  analysis  and 

e.  is  compatible  with  ionspray  mass  spectrometry. 

Here  we  propose  the  use  of  a  simple,  low-molarity  triethylammonium  formate 
(TEAF)/acetonitrile  system  for  the  routine  detection/characterization  of  glycosylated  PTH- 
amino  acids,  which  meets  these  criteria.  We  also  show  that  the  PTH  chromophore  conjugated 
to  a  glycoamino  acid  can  be  used  as  an  effective  “tag”  for  obtaining  structural  information 
on  specific  sites  of  glycosylation  following  exoglycosidase(s)  treatment  of  glycopeptides. 
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MATERIALS  AND  METHODS 
Materials 

PTH-amino  acid  standards  were  from  Applied  Biosystems  (Div.  of  Perkin-Elmer, 
CA);  analytical  grade  formic  acid,  90%,  was  from  Ajax  Chemicals  (Australia);  triethylamine, 
sequencing  grade,  Cat  no.  25 1 08,  was  from  Pierce;  human  glycophorin  A  (Cat.  no,  G-95 1 1 ), 
Ovomucoid  (Cat  No.  T-201 1)  and  bovine  K-casein  macroglycopeptide  (Cat  no.  C-7278)  were 
purchased  from  Sigma.  Recombinant  PsA  (rPsA),  a  secreated  form  of  cell  surface  glycopro¬ 
tein  of  Dictyostelium  discoideum  was  prepared  by  the  method  of  Zhou-Chou  et  al  (1994). 
P-galactosidase  was  from  Diplococcus  pneumoniae  (Cat  no.  188718,  Boehringer 
Mannheim).  Trifluoroacetic  acid  (TFA)  was  obtained  from  Sigma-Aldrich. 

Preparation  of  Tryptic  Glycopeptides 

The  tryptic  glycopeptides  from  the  human  serum  albumin  mutant  Casebrook 
(Arg485— Lys500)  and  rPsA  (Ile88— Lysl22)  were  prepared  according  to  the  method  of 
Gooley  et  al.^  (1994a)  except  that  cysteines  were  alkylated  with  4-vinylpyridine  according 
toTarr(1986). 

Preparation  of  Glycopeptide  from  Bovine  K-Casein  Macroglycopeptide 

The  bovine  K-casein  Metl06-Thrl24  glycopeptide  was  prepared  by  Cjg  (Sephasil™, 
Pharmacia-Biotech)  reversed-phase  chromatography  (SMART™  system,  Pharmacia- 
Biotech)  from  bovine  K-casein  macroglycopeptide  according  to  Pisano  et  al,  (1994). 

Desialylation  of  Glycopeptides/Glycoproteins 

Between  0  5—3  nmol  (20-100  pi)  of  glycopeptide  or  glycoprotein  in  20%  (v/v) 
acetonitrile,  was  mixed  with  an  equal  volume  of  0  2  M  TFA  and  incubated  at  80°C  for  Ih  to 
remove  sialic  acids.  The  sample  was  then  diluted  1  in  10  (v/v)  with  20%  (v/v)  acetonitrile 
and  concentrated  to  «  1 0  pi  in  a  vacuum  centrifuge.  The  process  of  dilution  and  concentration 
was  repeated  once. 

(3-Galactosidase  Digestion  of  Desialylated  Albumin  Casebrook  Tryptic 
Glycopeptide 

Approximately  3  nmol  of  desialylated  albumin  Casebrook  (Arg485— Lys500)  were 
dissolved  in  150  pi  of  40  mM  acetate  buffer,  pH  6  with  10%  (v/v)  acetonitrile.  The 
glycopeptide  solution  was  then  divided  into  three  50  pi  aliquots  and  2  mU  of  P-galactosidase 
was  added  to  one  aliquot  and  incubated  for  2  hr  at  37°C  to  obtain  a  partial  digest.  To  the 
second  glycopeptide  aliquot,  2  mU  of  P-galactosidase  was  added  and  the  third  aliquot  was 
incubated  without  enzyme  as  a  control.  These  two  samples  were  incubated  at  37°C  for  24 
hr.  The  glycopeptide  was  separated  from  the  P-galactosidase  enzyme  by  Cjg  (Sephasil™, 
Pharmacia-Biotech)  reversed-phase  chromatography  (SMART™  system,  Pharmacia- 
Biotech)  using  a  30  min  linear  gradient:  0  05%  (v/v)  trifluoroacetic  acid  (TFA)  as  solvent  A, 
85%  (v/v)  acetonitrile  +  0  045%  (v/v)  TFA  as  solvent  B. 
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Covalent  Attachment  and  Solid-Phase  Edman  Degradation 

Human  Glycophorin  A  (GpA).  Between  0  5-2  nmol  of  desialylated  human  glyco- 
phorin  A  was  dissolved  in  20%  (v/v)  acetonitrile  and  covalently  attached  to  Sequelon  AA 
membranes  via  the  side-chain  carboxyl  groups  using  water  soluble  iV-ethyl-AT-dimethylami- 
nopropylcarbodiimide  (EDC).  The  coupling  reaction  was  carried  out  by  the  addition  of  5  pi 
of  coupling  buffer  (02  mg  EDC/pl),  at  4°C  for  15  min  as  described  by  Liang  and  Laursen, 
(1990).  The  coupling  buffer  used  was  that  supplied  by  MilliGen/BioSearch  in  the  Sequelon 
AA™  attachment  kit.  The  coupling  reaction  was  terminated  by  vortexing  the  Sequelon  AA™ 
membranes  in  1  ml  of  50%  (v/v)  methanol,  followed  by  1  ml  of  methanol  then  drying  the 
membranes  at  55°C. 

Bovine  K-Casein  Glycopeptides  and  Tryptic  Peptides  of  Serum  Albumin  Casebrook 
andrPsA.  Between  02-1  nmol  of  desialylated  glycopeptides  were  covalently  attached  to 
Sequelon  AA™  membranes  by  using  the  manufacturer’s  recommended  procedure  (See 
Sequelon  AA™  attachment  kit  User’s  guide)  and  the  incubation  was  carried  out  at  4°C  to 
increase  coupling  yield  as  recommended  by  Laursen  et  al.,  (1991). 

Sequelon  AA™  coupled  protein/peptide  membranes  were  subjected  to  automated 
solid-phase  Edman  degradation  using  a  MilliGen  ProSequencer™  6600  with  the  standard 
6600B  method  supplied  by  the  manufacturer.  The  PTH-glycoamino/amino  acid  derivatives 
were  transferred  directly  from  the  conversion  flask  to  the  on-line  HPLC  system. 

On  line  HPLC 

The  on-line  HPLC  system  consisted  of  a  Waters  600  multisolvent  pump  delivery 
system  supported  by  a  Waters  600-MS  system  controller  and  a  Waters  490E  programmable 
multiwavelength  detector  set  at  269  nm  and  313  nm.  The  PTH-amino  acids  were  separated 
by  on-line  reversed-phase  chromatography  using  a  3  9  mm  x  300  mm  Cjg  Nova-Pak 
(Waters)  column. 

Solvent  A:  5  mM  TEAF  buffer  was  prepared  by  the  addition  of  300  pi  of  formic  acid 
to  1  2  1  of  degassed  MilliQ  water  and  the  pH  was  adjusted  to  pH  4  0  with  the  addition  of 
triethylamine  (620  pi).  Solvent  B;  100%  acetonitrile  (Ajax  chemicals,  Australia);  both 
solvent  A  and  solvent  B  reservoirs  were  kept  under  constant  helium  head  pressure  of 
approximately  20  kPa  during  HPLC  operation.  Optimal  separation  of  PTH-gly¬ 
coamino/amino  acids  was  achieved  by  modifying  the  manufacturer’s  gradient  (see  Table  1). 


Table  1.  Gradient  conditions  for  PTH- 
amino/glycoamino  acid  separation* 


Time  (min) 

Solvent  A 

Solvent  B 

Initial 

95 

5 

0.7 

80 

20 

1.4 

73 

27 

2.8 

73 

27 

5.7 

55 

45 

7.4 

55 

45 

8.1 

53 

47 

12 

20 

80 

20 

95 

5 

*Flow  rate  of  0.7  ml/min. 
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Calculation  of  Corrected  Yields 

The  combined  peak  areas  of  a  completely  glycosylated  PTH-Thr(Sac)142  site  in 
bovine  K-casein  macroglycopeptide  was  found  to  be  equivalent  to  0  83  of  the  equivalent 
yield  of  PTH-Val  (Pisano  et  al.,  1994).  Therefore  a  correction  factor  of  12  was  used  to 
convert  the  area  of  the  PTH-Thr(Sac)  to  pmol  and  this  was  applied  to  the  Metl06-Thrl24 
glycopeptide  to  determine  the  amount  of  glycosylation  on  Thrl21 . 

Monosaccharide  Compositional  Analysis  of  PTH-Ser(Sac) 

The  PTH-glycoamino  acids  («  400  pmol)  were  collected  from  the  MilliGen  ProSe- 
quencer™  on-line  HPLC.  An  equal  volume  of  4  M  TFA  was  added  and  the  sample  was 
hydrolysed  at  100°C  for  4  h.  After  evaporation  of  the  acid,  the  liberated  monosaccharides 
were  analysed  by  HPLC  using  a  Dionex  CarboPac  PAl™  column  (4  mm  x  250  mm)  with  a 
waters  600  LC  system  and  464  pulsed  amperometric  electrochemical  detector.  The  sugars 
were  eluted  isocratically  with  15  mM  NaOH  and  were  identified  by  comparison  with 
standards.  An  internal  standard  of  2-deoxyglucose  was  used  for  quantitation. 


RESULTS 

Solid-phase  sequence  analysis  of  glycophorin  A,  bovine  K-casein,  ovomucoid  and 
human  mutant  albumin  Casebrook  was  used  to  characterize  their  glycosylated  amino  acids 
by  their  retention  time,  peak  distribution  pattern,  chromatographic  mobility  shift  following 
exoglycosidase  treatment  and  monosaccharide  composition.  These  results  were  made  pos¬ 
sible  using  low  molarity  acidic  solvents  as  solvent  A  and  acetonitrile  as  solvent  B  with  on-line 
Cjg  reversed-phase  HPLC  analysis  of  the  PTH-glycoamino/ amino  acids  in  either  preparative 
or  analytical  modes. 

Separation  of  PTH-Amino  Acids  in  TEAF  Buffer 

The  gradient  of  5  mM  TEAF  pH  4  0  buffer  in  acetonitrile  (Table  1)  effectively 
resolves  the  20  amino  acids  in  an  8  min  window  (Fig.  1).  The  pattern  of  elution  for  the  first 
1 1  amino  acids  (Asp,  Asn,  Ser,  Gin,  Thr,  Glu,  Gly,  His,  Ala,  Tyr  and  Arg)  is  similar  to  the 
sodium  acetate/tetrahydrofuran  buffer  system  except  that  Glu  elutes  before  Gly.  The  final  9 
amino  acids  (Pyridylethyl(PE)-Cys,  Met,  Val,  Pro,  Trp,  Lys,  Phe,  He  and  Leu)  have  an  elution 
profile  typical  for  chromatogram  using  the  MilliGen  ProSequencer™  system  equipped  with 
a  Waters  PicoTag'”  Cjg  column  and  ammonium  acetate/acetonitrile  gradient  (Gooley  et  al., 
1991).  Dehydroalanine  (from  Ser)  and  dehydro-a-aminobutyric  acid  (from  Thr)  elute  at  1 1  -8 
and  12  1  min,  respectively  and  are  monitored  by  simultaneous  detection  at  269  nm  and  313 
nm  (data  not  shown).  The  Edman  degradation  by-products  dimethylphenylthiourea 
(DMPTU,  104  min)  and  diphenylthiourea  (DPTU,  14  min,  which  co-elutes  with  Trp)  are 
not  problematic  in  solid-phase  sequencing  and  do  not  interfere  with  amino  acid  assignments. 
The  only  disadvantage  of  the  low  molarity  TEAF  buffer  system  is  that  His  and  Arg  are  very 
sensitive  to  variations  in  pH  and  ionic  strength.  However,  careful  titration  of  the  buffer 
provides  a  satisfactory  elution  position  for  both  these  amino  acids  (Fig.  1). 

The  elution  positions  of  all  20  amino  acids  were  confirmed  by  the  known  N-terminal 
amino  acid  sequence  analysis  of  human  glycophorin  A  and  ovomucoid. 
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Time  (min) 


Figure  1.  A  C|8  HPLC  chromatogram  with 
an  elution  profile  of  19  PTH-amino  acid 
standards  routinely  encountered  in  N-termi- 
nal  sequence  analysis.  The  PTH-amino  acids 
are  (in  order  of  elution):  Asp  (D),  Asn  (N), 
Ser  (S),  Gin  (Q),  Thr  (T),  Glu  (E),  Gly  (G), 
His  (H),  DMPTU  at  10  4  min,  Ala  (A),  Tyr 
(Y),  Arg  (R),  Met  (M),  Val  (V),  Pro  (P),  Trp 
(W)  which  co-elutes  with  DPTU,  Lys  (K), 
Phe  (F),  He  (I)  and  Leu  (L).  PE-Cys(PE-C)  is 
not  routinely  included  in  our  PTH-amino  acid 
standards  mixture  and  subsequently  its  elu¬ 
tion  time  was  identified  separately.  The  elu¬ 
tion  position  of  PE-Cys  is  indicated  on  the 
elution  profile  by  a  arrow.  The  20  PTH-amino 
acids  were  separated  using  5  mM  TEAF,  pH 
4  0  as  solvent  A  and  acetonitrile  as  solvent  B. 


Separation  of  PTH-Glycoamino  Acids  in  TEAF  Buffer 

Reversed-phase  elution  of  PTH-amino  acids  with  5  mM  TEAF,  pH  4  0,  provides  a 
clear  chromatographic  window  for  the  elution  of  PTH-Asn(Sac),  PTH-Ser(Sac)  and  PTH- 
Thr(Sac)  which  all  elute  prior  to  PTH-Asp  (Fig.  2,  Table  2).  PTH-Asn(Sac)  from  Casebrook 
tryptic  glycopeptide  Arg485-Lys500  elutes  first  off  the  column  as  two  peaks,  one  of  which 
is  heterogeneous  [Asn(Sac)i^  at  6  9  min],  and  a  small  peak,  Asn(Sac)2,  at  7  65  min  (Fig.2d). 
PTH-Ser(Sac)  from  human  glycophorin  A  consistently  elutes  as  two  peaks:  Ser(Sac)i  at  7  25 
min  and  Ser(Sac)2  at  7  85  min.  The  first  peak  elutes  as  a  poorly  resolved  doublet  and  Ser(Sac)2 
co-elutes  with  Thr(Sac)2  (Fig.  2b  and  c  ,  Table  2).  Thr(Sac)  elutes  as  two  major  peaks 
Thr(Sac)]  at  7  5  min  and  Thr(Sac)2  at  7  85  min  and  two  minor  peaks  Thr(Sac)3  at  8  25  min 
and  Thr(Sac)4  at  8  65  min  (Fig.  2b,  Table  2). 


Time  (min) 


Figure  2.  A  comparison  of  the  Cig  HPLC  elu¬ 
tion  profiles  for  the  three  main  groups  of  PTH- 
glycoamino  acids:  Asn(Sac),  Ser(Sac)  and 
Thr(Sac)  separated  with  solvent  A  as  5  mM 
TEAF  buffer,  pH  4  0.  (a)  First  five  PTH-amino 
acids  to  elute  from  the  column:  Asp  (at  8  0 
min);  Asn  (at  8  3  min);  Ser  (8  75  min),  Gin 
(8'89  min)  and  Thr  (at  9  2  min),  (b)  PTH- 
Thr(Sac)  and  (c)  PTH-Ser(Sac)  from  the  N-ter- 
minal  sequence  of  human  glycophorin  A  after 
4  and  2  cycles  of  Edman  degradation,  respec¬ 
tively.  (d)  PTH-Asn(Sac)  from  the  albumin 
Casebrook  tryptic  glycopeptide  following  10 
cycles  of  Edman  degradation. 
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Table  2.  Retention  times  of  PTH-glycoforms  and  parent 
PTH-amino  acids 


PTH-(glyco)ainino  acids 

PTH-oligosaccharide 

Retention  time 
(Rt,  min) 

Asn(Sac)| 

GlcNAc2Man3GlcNAc2Gal 

2  6.85 

Asn(Sac)2 

GIcNAc2Man2* 

7.65 

Ser(Sac)i 

GalNAciGalj 

7.25 

Ser(Sac)2 

7.85 

Thr(Sac)] 

GalNAcjGal, 

7.50 

Thr(Sac)2 

GalNAcjGali? 

7.85 

Thr(Sac)3 

GlcNAc  or  GalNAc 

8.25 

Thr(Sac)4 

GlcNAc  or  GalNAc? 

8.65 

Asn 

8.30 

Ser 

8.75 

Thr 

9.16 

*Previously  reported  as  GlcNAc2Mani  (Gooley  et  a/.,  1994a). 


One  major  advantage  of  a  unique  chromatographic  window  for  the  glycosylated 
amino  acids  is  the  detection  of  partially  glycosylated  amino  acids  free  from  the  background 
of  the  other  amino  acids.  This  is  best  demonstrated  by  the  sequence  analysis  of  the  bovine 
K-casein  glycopeptide  Metl06— Thrl24.  Figure  3a  shows  the  corrected  yield  for  this  peptide 
from  Asnl  1 4--Asnl  23.  Normal  PTH-Thr/Ser  are  recovered  in  low  yield  due  to  the  production 
of  the  dehydro  forms  of  Ser/Thr  during  the  coupling  and  cleavage  reactions.  Hence,  these 
two  amino  acids  are  rarely  quantitated  and  it  generally  suffices  to  assign  these  on  the  basis 
of  detection  of  their  dehydro  forms  which  are  detected  at  313  nm. 

However,  the  glycosylated  forms  of  the  Ser  and  Thr  are  recovered  in  high  yield  and 
it  is  possible  to  determine  how  much  of  the  amino  acid  is  modified  (see  Materials  and 
Methods).  It  was  estimated  that  5%  of  the  Thrl21  is  glycosylated.  An  enlarged  section  of 
the  Thrl2 1  chromatogram  is  shown  in  Fig.  3b  with  the  previous  cycle  Prol20  overlaid.  The 
solid  line  is  the  5%  glycosylated  Thrl21  and  is  clearly  visible  after  16  cycles  of  Edman 
degradation.  The  two  major  glycoforms  of  Thr(Sac):  (Thr(Sac)i  and  Thr(Sac)2  elute  at  19 
and  8-2  min  respectively,  while  Thr  elutes  at  9  1  min)  are  easily  distinguishable  above  the 
preceding  cycle  overlay  (dotted  line.  Fig.  3b). 

Monosaccharide  Analysis  of  Glycosylated  PTH  Amino  Acids 

PTH-glycoamino  acids  released  by  solid-phase  Edman  degradation  of  the  N-terminal 
extracellular  domain  of  human  GpA  (PTH  Ser(Sac)iand2  at  position  2  and  PTH  Thr(Sac)iand2 
at  position  4),  of  the  spacer  domain  of  PsA  of  Dictyostelium  discoideum  (PTH  Thr(Sac)3and4 
at  position  91)  and  of  the  tryptic  peptide  of  albumin  Casebrook  (PTH  Asn(Sac)ia„d2  at 
position  494)  were  collected  and  characterised  by  monosaccharide  compositional  analysis 
(Table  2).  The  low  molarity  elution  buffers  allowed  the  glucose  contamination  to  be  kept  to 
a  level  able  to  be  completely  resolved  from  the  component  sugars  by  HPAEC. 

Chromatographic  Shift  of  PTH-Asn(Sac)  following  p-Galactosidase 
Treatment 

Human  mutant  albumin  Casebrook  has  a  single  biantennary  A-linked  oligosaccharide 
at  Asn494.  The  PTH  glycoamino  acid  released  by  Edman  degradation  showed  chroma¬ 
tographic  heterogeneity  after  10  cycles  (Fig.  2d)  The  intact  tryptic  glycopeptide  (Arg485- 
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Amino  acid  sequence 


Time  (min) 


Figure  3.  (a)  Corrected  yields  for  the 
solid-phase  Edman  degradation  from 
Asnl  14— Asnl23  from  the  gly copeptide 
Metl06-Thrl24  from  bovine  K-casein 
macroglycopeptide.  The  recovery  of 
both  Glu  (E)  and  Asp  (D)  was  low  since 
they  remain  covalently  attached  to  the 
Sequelon  AA  membrane  following  car- 
bodiimide  activation  of  the  side-chain 
carboxyl  groups.  Non-glycoamino  acids 
are  shown  as  shaded  bars  and  glyco- 
amino  acids  are  solid  black  bars,  (b)  A 
chromatogram  overlay  of  cycle  1 5  (dot¬ 
ted  line)  and  cycle  1 6  (solid  line)  focused 
on  the  elution  positions  of  both  PTH- 
Thr(Sac)121  and  PTH-Thrl21  from  the 
partially  glycosylated  peptide  Metl06— 
ThrI24  following  solid-phase  Edman 
degradation.  The  peak  at  8  5  min  is  a  lag 
from  Asnl  14.  The  chromatography  is  as 
described  in  the  Materials  and  Methods 
section  with  the  exception  that  Solvent  A 
was  2  mM  acetic  acid. 


Table  3.  Resolution  of  glycoforms  on  PTH-Asn494  from 
albumin  Casebrook 


PTH-Oligosaccharide 

Retention  time 
(Rt,  min) 

GlcNAc2Man3GlcNAc2Gal2  (Asn(Sac)i) 

6.85 

GlcNAc2Man3GlcNAc2Gali 

7.05 

GlcNAc2Man3GlcNAc2 

7.15 

GlcNAc2Man2  (Asn(Sac)2) 

7.40 
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Figure  4.  Solid-phase  Edman  degradation  of  ^ 

human  mutant  albumin  Casebrook  tryptic 
glycopeptide  following  digestion  by  13-galac- 
tosidase.  Chromatograms  show  PTH- 
Asn(Sac)  glycoamino  acids  separated  after 
1 0  cycles  of  Edman  degradation  of  the  glyco- 
peptide(a)  which  had  been  incubated  with 
enzyme  for(b)  2  h  and  (c)  24  h. 


Asn(Sac)i 


Lys500)  was  subjected  to  a  time  course  of  digestion  with  p-galactosidase  to  see  if  the 
heterogeneity  observed  was  due  to  the  sequential  loss  of  galactose.  There  was  a  progressive 
shift  of  the  major  peak  of  PTH  Asn(Sac)i  (6  85  min)  to  a  longer  retention  time  with  a  peak 
at  7  05  min  appearing  after  2  h  and  the  final  product  after  24  h  of  incubation  eluting  at  7T5 
min  (Fig.  4).  Monosaccharide  analysis  of  this  peak  showed  that  there  was  no  galactose 
present.  The  heterogeneity  of  the  PTH-glycoamino  acid  could  thus  be  attributed  to  loss  of 
terminal  galactose  residues  during  the  Edman  chemistry.  These  glycoforms  were  separated 
by  the  chromatographic  conditions  used  (Fig.  3,  Table  3). 


DISCUSSION 

The  accepted  practice  of  ignoring  modified  amino  acids  during  routine  sequence 
analysis  is  no  longer  necessary  as  glycoamino  acids  can  be  identified  as  part  of  solid-phase 
Edman  degradation.  This  is  particularly  important  in  the  for  quality  control  of  products 
from  eukaryotic  expression.  These  modified  amino  acids  hold  key  insights  into  the  primary 
structure  motifs  that  confer  modifications.  We  have  developed  a  method  for  detecting 
and  characterizing  glycosylated  Asn,  Ser  and  Thr  (Gooley  et  aL,  1994b;  Pisano  et  ai, 
1994).  However,  in  routine  microsequencing  our  methodology  was  incompatible  with 
core  facilities  where  an  instrument  cannot  be  dedicated  solely  to  sequencing  known 
glycosylated  proteins. 

Here  we  describe  the  development  of  a  triethylammonium  formate  (TEAF)  buffered 
system  for  use  in  the  routine  sequence  analysis  of  PTH-glycoamino  acids  extracted  by 
N-terminal  solid-phase  Edman  degradation  with  on-line  Cjg  reversed-phase  HPLC  chroma¬ 
tography.  This  system  can  separate  both  the  glycosylated  and  non-glycosylated  PTH-amino 
acids  in  one  chromatographic  run.  We  also  show  that  it  is  possible  to  obtain  saccharide 
sequence/structure  information  by  monitoring  the  retention  times  of  PTH-glycoamino  acids, 
following  exoglycosidase(s)  treatment. 
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PTH-Amino  Acid/Glycoamino  Acid  Chromatography 

The  new  TEAF  system  resolves  all  20  amino  acids  (Fig.  1)  and  separates  the  three 
major  groups  of  glycoamino  acids  (Fig.  2  b  and  d),  into  their  own  chromatographic  space. 
The  TEAF  buffer  system  provides  a  T5  min  “glycosylation  window”  for  the  unambiguous 
detection  of  glycosylated  Asn,  Ser,  and  Thr  residues,  including  partially  glycosylated  forms 
(Fig.  3).  This  has  been  demonstrated  by  the  N-terminal  sequence  analysis  of  Asn(Sac)  at 
position  494  from  albumin  Casebrook  tryptic  glycopeptide  (Arg485-Lys500),  Ser(Sac)  at 
position  2  and  Thr(Sac)  at  position  4  from  the  N-terminus  of  human  GpA  and  Thr(Sac)  in 
the  spacer  domain  of  PsA  of  Dictyostelium  discoideum.  The  elution  profiles  of  Asn(Sac), 
Ser(Sac)  and  Thr(Sac)  in  the  TEAF  system  were  found  to  be  consistent  with  data  obtained 
from  Cig  reversed-phase  chromatography  with  2  mM  acetic  acid  or  2  mM  formic  acid  as 
solvent  A  (Pisano  et  al.,  1994;  Gooley,  et  al.,  1994b). 

The  separation  between  Asn(Sac)i  and  Asn(Sac)2  has  been  improved  with  TEAF  so 
that  Asn(Sac)2  no  longer  co-elutes  with  the  major  Ser(Sac)  glycoform  Ser(Sac),.  Also  the 
separation  between  Ser(Sac)i  and  Thr(Sac)i  has  increased  by  0  35  min  (Table  2;  Pisano  et 
al.,  1994).  The  pattern  of  glycosylated  Thr(Sac)  peaks  (Fig.  2b)  is  identical  to  that  observed 
in  the  rat  CD8a  hinge  peptide  (Gooley  et  al.,  1991)  and  human  GpA  (Pisano  et  al.,  1993), 
with  two  major  peaks,  Thr(Sac)i  and  Thr(Sac)2,  eluting  well  before  Thr  (Fig.  2a  and  b).  This 
characteristic  pair  of  peaks  probably  represents  diastereomeric  forms  of  PTH-Thr(Sac),  a 
similar  pattern  to  that  obtained  for  PTH-P-methyl-S-ethyl-cysteine,  the  ethanethiol  adduct 
of  p-eliminated  phosphothreonine  (Meyer  et  al.,  1993) 

The  separation  of  Thr(Sac)3  and  Thr(Sac)4  from  Asp,  Asn  and  Ser  also  has  signifi¬ 
cance  because  these  peaks  have  been  shown  to  co-elute  with  the  diasteromeric  forms  of 
PTH-Thr(Sac)  from  the  spacer  domain  of  recombinant  PsA,  a  cell  surface  glycoprotein,  from 
Dictyostelium  discoideum  (Gooley  et  al,  1991).  The  PTH-Thr(Sac)  from  recombinant  PsA 
is  believed  to  be  a  single  GlcNAc  residue  conjugated  to  Thr  (Gooley  et  al,  in  preparation). 
Single  GlcNAc  residues  have  recently  been  reported  with  increased  frequency  on  many 
intracellular  proteins  such  as  nuclear  pore  proteins  (Hart  et  al.,  1989),  neurofilaments  (Dong 
et  al,  1993)  and  keratins  (Ku  and  Omary,  1994). 

The  ability  to  detect  partially  glycosylated  residues  is  dependent  on  the  PTH-glyco- 
sylated  residues  eluting  in  a  region  on  the  chromatogram  with  low  background  noise. 
PTH-glycoamino  acids  separated  by  2  mM  acetic  acid,  formic  acid  and  TEAF  buffer  all  have 
very  low  PTH-amino  acid  background  in  the  “glycosylation  windows”.  Partially  glycosy¬ 
lated  glycoforms  are  typical  of  K-casein  (Pisano  et  al.,  1994)  and  their  detection  was  made 
possible  by  their  early  elution. 


Monosaccharide  Analysis 

Monosaccharide  analysis  is  an  important  first  step  in  the  characterization  of  oligosac¬ 
charides  since  mass  spectrometry  alone  cannot  distinguish  between  isomeric  sugars. 

We  have  shown  that  it  is  possible  to  collect  a  PTH-glycoamino  acid  directly  off  the 
sequenator  in  TEAF  buffer/acetonitrile  and  subject  it  to  monosaccharide  analysis  (Table  2). 
TEAF,  like  2  mM  acetic  acid  and  2  mM  formic  acid,  is  volatile  and  compatible  with  ion-spray 
mass  spectrometry. 

p-Galactosidase  Digest  of  Albumin  Casebrook  Tryptic  Glycopeptide 

Not  all  protein  analysis  core  facilities  may  be  equipped  with  HPAEC  and  ion-spray 
capabilities.  The  technique  of  on-line  Edman  degradation  described  in  this  paper  permits  the 
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determination  of  the  sequenee  of  the  oligosaccharide(s)  by  using  exoglycosidase  treatments 
and  subsequent  monitoring  of  the  products  . 

The  A^-linked  glycosylation  site  at  Asn494  on  serum  albumin  Casebrook  has  been 
well  characterized  by  both  monosaccharide  and  NMR  analysis  and  was  found  to  consist  of 
a  single  NeuAc2Gal2GlcNAe2Man3GlcNAc2  linkage  (Haynes  et  al,  1992).  Recently,  the 
desialylated  PTH-Asn(Sac)i  was  collected  after  10  cycles  of  Edman  degradation  from  the 
Casebrook  tryptic  glycopeptide  (Arg485-Lys500)  and  subjected  to  compositional  analysis 
and  ion-spray  mass  spectrometry.  The  major  mass  was  found  to  be  consistent  with  a 
PTH-Asn-GicNAc2Man3GlcNAc2Gal2  (Gooley,  et  aL,  1994).  There  were  also  secondary 
masses  which  correspond  to  the  loss  of  one  and  two  hexose  residues  in  the  mass  spectra  for 
PTH-Asn(Sac)494  from  the  tryptic  glycopeptide.  The  chromatography  after  exoglycosidase 
treatment  shows  that  these  were  Edman  degradation  by-products  and  not  fragmented  parent 
ions  that  arose  from  the  mass  analysis  process  (Gooley  et  aL,  1994a).  Treatment  with 
p-galactosidase  (Fig.  4)  show  that  the  degradation  was  due  to  the  loss  of  terminal  galactose 
residues.  We  have  defined  the  retention  times  of  a  PTH-Asn  A-linked  biantennary  oligosac¬ 
charide  with  the  loss  of  one  galactose  as  7  05  min  and  loss  of  two  galactoses  as  7T  5  min 
(Table  3). 

Reversed-phase  chromatography  of  PTH-glycoamino  acids  obtained  by  solid-phase 
Edman  degradation  was  first  demonstrated  by  Gooley  et  aL,  (1991).  We  have  now  shown 
that  it  is  possible  to  monitor  single  saccharide  changes  in  PTH-glycoamino  acids  by  using 
a  TEAF/acetonitrile  buffer  system. 

By  using  this  system  it  is  possible: 

1 .  to  create  a  retention  time  data  base  for  the  on-line  detection  of  different  oligosac¬ 
charide-amino  acid  linkages  as  a  routine  first  step  in  the  characterization  of  site-specific 
glycosylation  sites  released  by  solid-phase  Edman  degradation. 

2.  to  subject  either  the  glycopeptide  or  PTH-glycoamino  acid  to  exoglycosidase(s) 
and  monitor  the  change  in  chromatographic  profile  of  the  PTH-glycoamino  acid. 


CONCLUSION 

Solid-phase  Edman  degradation  with  on-line  Cjg  reversed-phase  chromatography 
using  TEAF/acetonitrile  has  been  developed  for  the  routine  detection  of  PTH-glycoamino 
acids.  This  system  is  useful  for  analytical  and  preparative  scale  PTH-glycoamino  acid 
analysis.  The  PTH-chromophore  is  a  convenient  “tag”  for  monitoring  site-specific  changes 
in  glycosylation. 
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INTRODUCTION 

Proteins  with  a  blocked  N-terminus  are  common.  Frequently  the  modification  in¬ 
volves  an  acetyl-,  formyl-  or  pyroglutamyl -moiety  coupled  to  the  a-amino  group  and  direct 
sequence  analysis  by  Edman  degradation  is  not  possible.  Several  enzymatic  and  chemical 
methods  to  remove  the  blocking  group  have  been  suggested  (cf  Tsunasawa  and  Hirano, 

1993) ,  but  they  often  suffer  from  poor  yields  and  a  large  extent  of  undesirable  peptide  bond 
cleavage.  Acetylation  represents  the  most  frequent  N-terminal  modification  and  is  found  in 
alcohol  dehydrogenases  among  many  other  proteins.  To  circumvent  the  conventional  ap¬ 
proach  to  sequence  analysis  of  blocked  proteins  (i.e.  proteolytic  cleavage,  HPLC  of  frag¬ 
ments  and  internal  sequence  analysis)  we  have  tested  direct  chemical  deacetylation  using  a 
mixture  of  trifluoroacetic  acid  and  methanol  (Gheorghe  et  al,  1995).  In  this  manner, 
drawbacks  as  high  protein  consumption,  long  handling  times  and  inaccessibility  of  the 
N-terminal  fragment  to  Edman  degradation,  are  avoided.  The  protocol  has  been  applied  to 
both  a  synthetic  peptide  corresponding  to  the  N-terminal  segment  of  horse  liver  alcohol 
dehydrogenase  and  to  the  intact  protein. 

A  technique  to  obtain  internal  sequence  information  from  N-teminally  blocked 
proteins  and  from  partially  sequenced  polypeptides,  that  in  addition  saves  protein  material, 
is  treatment  with  cyanogen  bromide  directly  on  the  sequencer  filter.  We  have  tested  this 
approach  for  analysis  of  new  structures  and  identification  of  known  proteins  (Bergman, 

1994) .  Polypeptides  bound  to  the  sequencer  filter  are  in  situ  cleaved  with  CNBr  followed 
by  analysis  of  the  resulting  internal  sequences.  Interpretation  is  facilitated  by  the  varying 
extent  to  which  peptide  bonds  are  cleaved  after  individual  methionines.  Both  electroblotted 
samples  and  samples  applied  in  solution  have  been  treated  with  CNBr  after  initial  sequence 
analysis  for  a  necessary  number  of  cycles.  In  this  manner,  both  unknown  and  known  proteins 
available  in  amounts  sufficient  for  only  one  sequencer  application  can  be  analyzed  and 
identified  even  if  they  are  blocked  at  the  N-terminus. 


Methods  in  Protein  Structure  Analysis,  Edited  by  M.  Z.  Atassi  and  E.  Appella 
Plenum  Press,  New  York,  1995 
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EXPERIMENTAL  PROCEDURES 

Deacetylation  was  tested  both  with  an  N-terminal  fragment  of  horse  liver  alcohol 
dehydrogenase  (residues  1  -  14)  and  with  the  intact  protein  (374  residues;  cf.  Jomvall,  1970). 
The  peptide  fragment  was  synthetically  prepared  using  an  Applied  Biosystems  430A  instru¬ 
ment  and  side-chain-protected  tertiary  butyloxycarbonyl  amino  acid  derivatives  (cf.  Kent, 
1988).  N-terminal  acetylation  was  performed  before  cleavage  from  the  resin  and  deprotection 
(treatment  with  a  mixture  of  acetic  anhydride  /triethylamine  /  dichloromethane  (9:4:87,  by  vol.) 
for  10  min  at  room  temperature)  to  avoid  simultaneous  modification  of  lysine  residues  present. 
Horse  liver  alcohol  dehydrogenase  was  purchased  from  Sigma.  Deacetylation  for  N-terminal 
sequence  analysis  of  both  peptide  and  intact  protein  was  performed  with  a  mixture  of 
trifluoroacetic  acid  (TFA)  and  methanol  (MeOH).  The  samples  were  carefully  lyophilized  to 
complete  dryness  in  small  (500  pi)  plastic  tubes  with  caps  after  which  1 00  pi  freshly  prepared 
TFA  /  MeOH  ( 1 : 1 ,  by  vol.)  was  added.  The  tubes  were  closed  and  after  a  brief  vortex  the  samples 
were  incubated  at  43''C  for  three  days.  Subsequent  to  this  treatment,  the  reagents  were  removed 
under  vacuum  and  the  products  were  analyzed  by  both  capillary  electrophoresis  and  Edman 
degradation.  For  capillary  electrophoresis,  a  Beckman  P/ACE  2100  system  operated  as 
described  (Bergman  et  al,  1991)  was  used,  and  the  sequence  analysis  was  performed  employ¬ 
ing  an  Applied  Biosystems  470A  instrument  with  reverse-phase  HPLC  of  phenylthiohydantoin 
amino  acids  essentially  as  described  (Kaiser  et  al,  1988). 

Cyanogen  bromide  cleavage  of  a  protein  immobilized  on  a  sequencer  filter  (Poly- 
brene-treated  glass  fiber  or  polyvinylidene  difluoride  (PVDF))  was  carried  out  with  a 
solution  of  0.2  g  CNBr/ml  70%  formic  acid  for  22-26  h  at  room  temperature.  After  a  sufficient 
number  of  Edman  cycles,  the  filter  was  placed  in  an  Eppendorf  tube  (1 .5  ml)  and  30  pi  CNBr 
solution  was  added.  A  small  additional  volume  (60  pi)  was  placed  in  the  bottom  of  the  tube, 
below  the  filter,  to  maintain  a  CNBr- saturated  atmosphere.  Nitrogen  gas  was  introduced  and 
incubation  was  performed  in  the  dark.  Following  this  treatment,  the  filter  was  dried  under 
vacuum  and  reapplied  to  an  Applied  Biosystems  470 A  sequencer. 


RESULTS  AND  DISCUSSION 

A  combination  of  trifluoroacetic  acid  and  methanol  was  found  to  cleave  the  N-terminal 
acetyl-group  of  polypeptides  with  high  specificity  (i.e.  with  a  low  extent  of  simultaneous  internal 
peptide  bond  cleavage).  The  approach  was  tested  on  a  synthetic  peptide  (14  residues)  correspond¬ 
ing  to  the  N-terminal  segment  of  horse  liver  alcohol  dehydrogenase  and  on  the  intact  protein  (374 
residues).  Deacetylation  was  monitored  as  a  function  of  reaction  time,  temperature  and  ratio 
between  trifluoroacetic  acid  and  methanol  using  capillary  electrophoresis  and  sequence  analysis. 
The  results  indicate  that  a  1 :1  (by  vol.)  mixture  of  TFA  /  MeOH  added  to  the  lyophilized  sample 
followed  by  incubation  for  three  days  at  43°C  is  efficient  (Gheorghe  et  al ,  1 995).  Both  the  peptide 
fragment  and  the  much  larger  protein  molecule  are  deblocked  without  predominant  cleavage  of 
internal  peptide  bonds.  Capillary  electrophoresis  of  the  14-residue  peptide  before  and  after 
deacetylation  reveals  that  only  17%  of  the  blocked  structure  remains  while  the  deacetylated  but 
else  intact  peptide  corresponds  to  the  major  peak  and  represents  65%  of  the  total  sample 
(Gheorghe  et  al. ,  1 995).  The  extent  of  undesirable  internal  cleavage  is  low  and  only  a  minor  peak 
corresponding  to  a  product  resulting  from  a  cleavage  after  a  glycine  in  position  4  (cf  Jomvall, 
1970)  can  be  detected  (18%  of  the  total  sample).  Sequence  analysis  after  deacetylation  of  the 
N-terminal  fragment  (Fig.  1)  and  the  intact  protein  reveals  initial  yields  up  to  60%.  Interestingly, 
the  ratios  of  deblocking  over  unspecific  cleavage  of  internal  bonds  are  similar  for  the  peptide  (7:1) 
and  the  protein  (8:1),  despite  the  much  larger  size  of  the  latter  molecule  (374  instead  of  14 
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residues)  (Gheorghe  et  al,  1995).  Consequently,  the  results  are  promising  for  direct  analysis  of 
N-terminally  acetylated  proteins  via  Edman  degradation  after  deblocking  of  the  intact  protein. 

Electroblotting  is  efficient  for  recovery  of  proteins  separated  at  the  low  pmol-level 
in  SDS/polyacrylamide  gels  and  the  blotted  material  is  easilly  accessible  for  direct  chemical 
cleavage  with  cyanogen  bromide.  A  42  kDa  DNA-binding  phosphoprotein  (cf  Egyhazi  et 
al.,  1991)  was  electroblotted  onto  a  Polybrene-treated  glass  fiber  filter  disc  as  described 
(Bergman  and  Jornvall,  1987).  Although  the  total  amount  of  sample  available,  360  pmol, 
was  applied  to  the  electrophoresis  gel,  no  significant  sequence  could  be  detected  when  the 
blotted  protein  was  analyzed  for  1 6  Edman  cycles,  establishing  the  absence  of  a  free  a-amino 
group  in  this  42  kDa  polypeptide.  The  filter  was  subsequently  removed  from  the  sequencer 
and  treated  with  CNBr.  After  reapplication  of  the  filter  to  the  sequencer,  several  sequences 
appeared  and  at  least  one  major  sequence  could  be  interpreted  for  nine  cycles  (Fig.  2).  The 
sequencer  initial  yield  was  60  pmol  or  17%  of  the  amount  applied  to  gel  electrophoresis. 

A  sample  of  human  endothelial  cell  proteins  was  separated  by  SDS/polyacrylamide 
gel  electrophoresis  and  a  42  kDa  band  was  isolated  through  electroblotting  (Schuppe-Koisti- 
nen  et  al,  1995;  Bergman  and  Jornvall,  1987).  Edman  degradation  of  the  blotted  material 
revealed  no  sequence  and  the  protein  was  concluded  to  be  blocked  since  approximately  700 
pmol  was  loaded  onto  the  electrophoresis  gel.  The  filter  disc  with  the  blocked  polypeptide 
was  removed  from  the  sequencer  after  19  cycles  and  in  situ  treated  with  CNBr  followed  by 
reapplication.  Several  sequences  were  now  detected  and  a  major  cyanogen  bromide  fragment 
was  analyzed  for  1 8  cycles  at  a  repetitive  yield  of  97%  which  allowed  identification  of  the 
42  kDa  protein  as  actin  (Schuppe-Koistinen  et  al,  1995). 

Transthyretin  (TTR)  associated  with  amyloid  deposits  in  the  heart  or  in  nerve  tissue  is 
known  as  a  highly  heterogeneous  mixture  of  N-terminally  blocked  and  truncated  polypeptides 
with  structures  identical  to  segments  of  the  plasma  TTR  sequence  except  for  point  mutations  at 
different  positions.  A  sample  of  amyloid  related  TTR  isolated  from  cardiac  tissue  was  separated 
by  SDS/polyacrylamide  gel  electrophoresis  and  a  major  band  at  14.5  kDa  was  recovered  via 
electroblotting  onto  a  PVDF -membrane  (Hermansen  et  al,  1995;  cf  Matsudaira,  1987). 
Sequence  analysis  for  14  cycles  revealed  a  polypeptide  starting  at  position  49  in  the  plasma 
TTR  sequence  (cf  Kanda  et  al ,  1 974).  However,  the  initial  yield  in  the  Edman  degradation  was 
unexpectedly  low,  only  6%  of  the  material  applied  to  the  gel,  and  therefore  the  protein  was 
concluded  to  be  partially  blocked.  After  in  situ  CNBr-cleavage  and  reapplication  of  the 
PVDF-membrane  to  the  sequencer,  two  additional  sequences  were  detected,  one  starting  at 
position  14  and  the  other  at  position  1 12  in  the  plasma  TTR  structure  (cf  Kanda  et  al,  1974). 
This  result  clearly  shows  the  presence  of  a  fraction  in  the  cardiac  amyloid  TTR  sample  that 
consists  of  blocked  polypeptides  starting  at  positions  before  residue  14.  Furthermore,  since  the 
sequence  of  plasma  TTR  contains  only  one  methionine  at  position  13,  the  second  fragment 
detected  after  CNBr-cleavage  (starting  at  residue  112)  indicates  a  point  mutation  to  be  present 
in  amyloid  related  TTR  isolated  from  heart  tissue  (Hermansen  et  al ,  1 995). 

The  amino  acid  sequence  of  procarboxypeptidase  A2  (PCP  A2)  in  rat  pancreas  is 
known  from  the  corresponding  cDNA  (Gardell  et  al,  1988).  It  has  the  N-terminal  structure 
Gln-Glu-Thr-Phe-  which  suggests  the  presence  of  a  blocking  pyroglutamic  acid  modification 
at  the  N-terminus.  An  HPLC-purified  sample  was  analyzed  for  15  Edman  cycles  after 
application  of  100  pmol  (the  total  amount  available)  and,  as  expected,  no  sequence  data  were 
obtained.  To  identify  the  blocked  protein,  in  situ  CNBr-cleavage  was  performed  followed 
by  reapplication  of  the  filter  to  the  sequencer  (Oppezzo  et  al,  1994).  The  resulting  seven 
sequences  detected  correspond  to  a  cleavage  after  each  methionine  present  in  the  PCP  A2 
structure  (cf  Gardell  et  al,  1988).  The  fragment  sequences  could  be  traced  up  to  ten  cycles 
of  Edman  degradation.  Interpretations  were  facilitated  by  the  cleavage  efficiency  that  varied 
for  different  methionines.  The  major  cleavage  after  Met-271  resulted  in  a  fragment  giving 
an  initial  yield  of  52  pmol  (52%)  of  the  starting  material  (Oppezzo  et  al,  1994). 
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In  conclusion,  efficient  deblocking  of  N-terminally  acetylated  proteins  using  a 
mixture  of  trifluoroacetic  acid  and  methanol  reveals  sequencer  initial  yields  up  to  60%  and 
a  ratio  of  deblocking  over  unspecific  cleavage  close  to  10:1.  Similarly,  cyanogen  bromide 
cleavage  directly  on  the  sequencer  filter  of  N-terminally  blocked  or  partially  sequenced 
polypeptides  provides  an  efficient  approach  to  analysis  and  identification  of  protein  struc¬ 
tures  at  the  pmol- level. 
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INTRODUCTION 

The  amino  acid  sequence  of  a  protein  of  interest  is  usually  one  of  the  first  pieces 
required  in  today’s  molecular  biology,  be  it  for  gene  cloning  or  synthesis  of  immunoreactive 
peptides.  To  date,  amino(N)-terminal  sequencing  using  the  Edman  degradation  procedure 
has  almost  exclusively  provided  such  data.  Methodologies  for  sequencing  proteins  from  their 
carboxy(C)-termini  have  remained  relatively  primitive  requiring  much  protein  in  return  for 
little  sequence  information.  Carboxypeptidase  digestion  is  still  the  most  widely  used  method 
despite  its  intrinsic  limitations  of  substrate  specificity  and  endoprotease  contamination. 
Several  chemical  degradation  methods  have  been  reported  (Stark,  1968;  Yamashita,1971; 
Bailey  et  al.,  1994),  and  a  few  automated  C-terminal  sequencers  are  almost  available  to  the 
public. 

Recently,  we  observed  that  peptides  subjected  to  the  vapor  of  either  90%  aqueous 
pentafluoropropionic  acid  at  90°C  appeared  to  have  amino  acid  residues  successively 
cleaved  from  their  C-termini  (Tsugita  et  al.,  1992a).  By  fast  atom  bombardment  (FAB)  or 
electrospray  ionization  mass  spectrometry,  the  C-terminally  successive  degraded  molecular 
ions  were  clearly  observed  and  the  peptide  C-terminal  amino  acid  sequence  was  deduced 
from  the  molecular  mass  differences.  The  predicted  reaction  mechanism  was  the  formation 
of  the  oxazolone  rings  at  the  C-terminal  amino  acids  followed  by  removal  of  the  C-terminal 
amino  acid  residues.  As  well  as  the  C-terminal  degradation,  two  specific  internal  peptide 
bond  cleavages  were  observed  at  the  C-terminal  side  of  internal  aspartic  acid  residues  and 
at  the  N-terminal  side  of  serine  residues. 

In  this  paper  we  present  an  extension  of  C-terminal  degradation  method  by  the  use 
of  perfluoroacyl  anhydride  vapor  instead  of  perfluoric  acid.  This  method  is  superior  to  the 
perfluoric  acid  vapor  method  since  more  extensive  C-terminal  degradation  was  observed 
and  essentially  no  internal  peptide  cleavage  was  observed.  Preliminary  reports  of  this  method 
have  been  presented  (Tsugita  et  al.,  1992b,  and  two  proceedings;  Tsugita  et  al.,  1 992c,  Tsugita 
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et  al.,  1 993).  Recently  we  also  published  a  preliminary  report  of  the  application  to  the  proteins 
(Nabuchi  et  ah,  1994). 


FURTHER  STUDIES  ON  REACTION  METHODS 

In  previous  experiments  (Tsugita  et  al.,  1992b,  1992c,  1993),  the  degradation  reaction 
was  conducted  in  the  presence  of  acetonitrile. 

When  the  same  degradation  reaction  was  performed  in  the  dry  glove  box  we  observed 
that  the  presence  of  acetonitrile  was  not  always  needed  from  the  reaction.  Comparison  of 
FAB  mass  spectra  of  the  reaction  products  of  the  dodecapeptide,  Ala-Arg-Gly-Ile-Lys-Gly- 
Ile-Arg-Gly-Phe-Ser-Gly,  under  the  usual  reagent  conditions  such  as  a  vapor  from  30% 
PFPAA  in  acetonitrile  solution  (Fig.  1  a),  PFPAA  vapor  only  (Fig.  1  b),  and  by  incubation  with 
vapors  from  separated  PFPAA  and  acetonitrile  (Fig.  1  c),  clearly  showed  that  acetonitrile  was 
not  necessary  for  the  reaction  to  proceed.  To  clarify  this  seemingly  contradictory  data,  we 
repeated  the  degradation  reaction  on  the  same  dodecapeptide  using  a  vapor  from  300  pi  of 
a  PFPAA:  water  mixture  of  10:1  (molar  ratio),  at  -20°C  for  ih  either  without  acetonitrile 
(Fig.  1  d)  or  in  the  presence  of  acetonitrile  ( 1 00  pi)  (Fig.  1  e).  Without  acetonitrile,  the  reaction 
did  not  proceed  in  the  presence  of  water,  probably  confirming  our  earlier  results.  However 
upon  inclusion  of  acetonitrile  in  this  reaction,  the  reaction  proceeded  as  normal.  We  surmise 
that  acetonitrile  must  absorb  the  moisture/water  in  the  air  ensuring  that  the  degradation 
reaction  can  take  place. 

Usually  operations  were  carried  out  in  a  glove  box  (1 100  x  650  x  700  mm)  that  was 
continuously  flushed  by  dry  nitrogen  gas.  The  sample  peptide  (2-  50  pg)  was  dried  in  a  small 
sample  tube  (6  x  40  mm)  in  a  vacuum  desiccator  and  then  transferred  to  the  reaction  tube 
(19  X  100  mm,  Pierce  ,  Rockford,  USA).  PFPAA  (10%  or  30%)  or  HFBAA  (15%)  in 
acetonitrile  were  used  for  the  reagent  solution.  These  reagents  were  obtained  in  300  pi 
ampoules  from  Nacalai  Tesque,  Kyoto,  Japan.  The  reagent  solution  (300^-500  pi)  was  added 
to  the  reaction  tube  but  outside  of  the  sample  tube(s),  whilst  dry  nitrogen  gas  was  continu¬ 
ously  flushed  into  the  sample  tube(s)  located  in  the  reaction  tube  (Fig.  2(A)  and  (B)).  The 
reaction  tube  was  cooled  with  liquid  nitrogen.  Care  must  be  taken  to  maintain  dry  conditions 
when  cooling  the  reagent,  as  moisture  in  the  air  easily  condenses  and  hydrolyses  the  acid 
anhydride.  At  liquid  nitrogen  temperature  the  reaction  tube  was  evacuated  (10‘^  Torr)  and 
sealed.  The  reaction  tube  was  transferred  to  the  reaction  bath  (Histo-bath,  Neslab  Instrument 
Inc.,  Newington,  USA)  set  at  -20°C  when  the  reaction  proceeded  for  various  reaction  times. 
After  the  reaction,  the  reaction  tube  was  again  transferred  to  liquid  nitrogen  when  the  reaction 
was  stopped.  The  sample  tube  was  removed  from  the  reaction  vessel,  and  then  dried  under 
vacuum  (See  Fig  2). 

Partial  oxidation  of  peptides  under  the  present  reaction  conditions  was  observed  for 
methionine  and  tryptophan  residues  and  unmodified  cysteine  residues.  Cysteine  residues 
may  be  pyridylethylated  (Amons,  1987)  before  the  reaction.  The  peptides  employed  in  this 
study  contain  almost  all  of  the  common  amino  acid  residues  and  the  C-terminal  degradation 
of  peptides  containing  these  amino  acids  exhibited  no  problems.  In  an  attempt  to  determine 


Figure  1.  Effect  of  acetonitrile  as  solvent  and  addition  of  water.  The  reactions  of  the  dodecapeptide,  ARGIK- 
GIRGFSG  (10  pg)  were  carried  out  at  -20°C  for  Ih  with  the  vapor  of ;  (a)  30%  PFPAA  acetonitrile  solution 
(300pl),  (b)  PFPAA  (90  pi),  (c)  PFPAA  (90  pi)  and  acetonitrile  (210  pi)  in  separate  tubes,  (d)  300  pi  of  a 
PFPAA:  water  mixture  of  10:1  (mol/mol)  without  acetonitrile  and  (e)  the  same  mixture  with  acetonitrile 
(100  pi). 
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Vacuum  pump 


Desiccator  Dewar  bottle  Reaction  bath 


Figure  2.  Reaction  tube(s)  and  grove  box;  which  contains  desiccator,  Dewar  bottle  and  reaction  bath. 


the  relative  ease  of  cleavage  we  investigated  degradation  yields,  as  a  percentage,  roughly 
calculated  from  the  peak  height  of  mass  spectra.  Table  1  lists  the  truncated  products  observed 
in  the  present  experiments  from  about  20  peptides  including  several  natural  C-terminal 
peptides  and  table  2  summarizes  the  relative  cleavage  ratio  of  peptide  bonds  from  more  than 
fifty  experiments  including  the  peptides  listed  in  table  1 .  The  vertical  column  is  the  amino 
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^Res.  =  Residues;  Acyl.  =  Acylation. 

#Stands  for  a  - 1 8  mass,  and  ##  for  two  of  - 1 8  mass. 
*Stands  for  a  -46  mass. 

+0  stands  for  a  oxidized  peptide  mass. 
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Table  2.  Degradation  ratio  of  peptide  bond  by  perfluoroacyl  anhydride  vapor 


D 

N 

T 

S 

E 

Q 

P 

G 

A 

V 

M 

I 

L 

Y 

F 

H 

K 

R 

W 

D 

99 

99 

9 

96 

94 

N 

55 

95 

98 

99 

80 

T 

22 

S 

95 

42 

93 

87 

91 

E 

68 

76 

94 

Q 

87 

94 

50 

P 

91 

65 

89 

40 

12 

90 

14 

G 

99 

97 

85 

29 

50 

65 

84 

95 

96 

A 

33 

89 

68 

96 

47 

80 

V 

69 

75 

27 

M 

99 

65 

I 

23 

60 

99 

70 

64 

84 

L 

91 

38 

90 

81 

90 

90 

Y 

94 

21 

96 

75 

75 

F 

98 

89 

94 

93 

96 

22 

85 

60 

78 

83 

H 

87 

56 

98 

K 

91 

97 

41 

75 

95 

88 

63 

93 

71 

66 

73 

77 

R 

14 

86 

58 

82 

42 

50 

23 

42 

42 

20 

W 
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Vertical  column  is  the  amino  side  residue  of  peptide  bonds  and  horizontal  column  is  the  carboxy  side 
residue.  Degradation  ratios  (%)  were  roughly  calculated  from  the  peak  heights  in  mass  spectra. 


Table  3.  Peptide  sequences  and  degraded  residues 


Peptide 

Residue 

Peptide 

Residue 

ARGIKGIRGFSG 

R 

LEDGPKFL 

D 

YGGFMRRVGRPE 

R 

RPPGFSPFR 

R 

YGGFLRRIRPKLKWDNQ 

R 

DRVYVHPFNL 

R 

YGGFM 

G 

EAKSQGGSN 

A 

YGGFL 

G 

NRVYVHPFHL 

R 

NRVYVHPFHL 

V 

YGGFLRRIRPKLKWDNQ 

S 

PRLIEDAEYAARG 

R 

WAGGDASGE 

S 

GIGKFLHS AGKFGKAFVGE IMKS 

K 

RRLIEDAEYAARG 

R 

HPFHLLVY 

F 

KRNKKNNIA 

K 

RFA 

R 

RPKQQGFFG 

R 

LWMRFA 

M 

VGKVTVN 

V 

MRFA 

R 

H  S  Q  GT  FT  SD Y  S  K Y  LD  S  RRAQDF VQWLMNT 

R 

RPPGFSPFR 

R 

KKKHPDYI 

K 

Underline  in  the  peptide  sequence  stands  for  identified  successively  degradation  at  positive  mode. 

Residue  stands  for  carboxy  terminal  residue  in  observed  smallest  ion  peak  at  positive  mode  in  FAB  mass 
spectrometry. 
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side  residue  of  peptide  bond  and  the  horizontal  row  is  the  carboxy  side  residue.  The 
degradation  ratio  given  are  not  quantitative  and  are  therefore  only  indicative  of  relative  ease 
of  bond  cleavage. 

Most  peptide  bonds  are  readily  cleaved  however  some  residue  combinations 
exhibited  some  difficulty  to  break.  Arg  residues  on  the  amino  side  in  peptides  seemed  to 
be  somewhat  resistant  to  degradation,  as  shown  in  Table  3.  However  this  is  not  true 
because  the  negative  mode  of  analysis  showed  further  extension  of  the  reaction  as  shown 
in  Fig.  3. 

Attention  to  the  ion  mode  used  in  FAB  mass  spectrometry  is  important  to  maximize 
the  amount  of  sequence  information  obtained  in  the  present  analysis.  We  degraded  the 
peptide  Tyr-Gly-Gly-Phe-Leu-Arg-Arg-Ile-Arg-Pro-Lys-Leu-Lys-Trp-Asp-Asn-Gln  using  a 
30%  PFPAA  acetonitrile  solution  at  -20°C  for  30  min  in  vapor  phase  followed  by  aqueous 
pyridine  vapor  treatment.  Mass  spectrometries  of  the  degraded  peptide  were  carried  out  in 
both  positive  mode  and  negative  mode.  Positive  mode  analysis,  as  shown  in  Fig.  3a, 
suggested  that  degradation  was  only  achieved  until  the  peptide  1-6.  All  that  has  happened  is 
that  the  positive  charge  has  been  lost  by  including  acylation.  Negative  mode  analysis 


Figure  3.  Positive  and  negative  ionization  of  FAB  mass  spectra  of  the  C-terminally  degraded  products  of  a 
peptide,  Tyr-Gly-Gly-Phe-Leu-Arg-Arg-Iie-Arg-Pro-Lys-Leu-Lys-Trp-Asp-Asn-Gln,  The  peptide  (5  pg)  was 
exposed  to  a  vapor  of  10%  PFPAA  acetonitrile  solution  at  -20°C  for  Ih.  The  degraded  product  was  treated  with 
water  vapor.  The  product  was  dissolved  in  2  pi  of  67%  acetic  acid  and  1  pi  of  solution  was  mixed  with  same 
volume  of  the  matrix  composed  of  glycerol,  thioglycerol  and  m  -nitrobenzyl  alcohol.  One  pi  of  the  solution 
was  applied  to  the  FAB  mass  spectrometer.  The  spectra  are  detected  in  (a)  positive  mode  and  (b)  negative  mode 
both  using  the  same  product  solution  and  matrix. 
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Figure  4.  Matrix  effect  on  FAB  mass  spectrum  of  C-terminally  degraded  product.  The  degradation  of 
ARGIKGIRGFSG  with  vapor  of  10%  PFPAA  acetonitrile  solution  followed  by  pyridine-water  vapor  and  the 
resultant  mixture  treatment  was  dissolved  in  3  pi  of  67%  acetic  acid.  Each  1  pi  of  solution  was  mixed  with 
same  volume  of  matrix  and  subjected  to  FAB  mass  spectrometer.  Matrix;  (a)  mixture  of  glycerol:  thioglycerol: 
m  -nitrobenzyl  alcohol  (1:1:1  v/v),  (b)  glycerol. 


demonstrated  that  the  reaction  has  proceeded  further  to  the  peptide  1-4  (Fig.  3b).  We  have 
observed  that  this  is  a  common  phenomenon.  Many  degradations  appear  to  stop  short  of 
what  is  actually  achieved,  such  as  at  positive  charged  arginine  residues.  Therefore  mass 
spectrometries  using  both  positive  and  negative  modes  are  needed  to  make  sure  all  the 
sequence  information  has  been  retrieved,  although  positive  mode  is  more  sensitive  than 
negative  mode,  in  general. 

Another  feature  of  the  analysis  procedure  that  has  a  significant  effect  on  the  data 
obtained  is  the  matrix  used  shown  in  Fig.  4.  A  mixture  of  glycerol,  thioglycerol  and  m 
-nitrobenzyl  alcohol  in  the  ratio  1:1:1  (Fig.  4a),  was  generally  found  to  be  superior  than 
glycerol  only  (fig.  4b).  The  mixture  containing  m-nitrobenzyl  alcohol  tends  to  stress 
hydrophobic  peptides.  When  an  unknown  sample  is  analyzed  then  we  suggest  that  at  least 
two  different  matrices  are  used. 

C-Terminal  degradation  of  peptides  having  an  a-carboxyl  amide  group  did  not  occur 
upon  exposure  to  perfluoroacyl  anhydride  vapor  under  the  present  conditions  (data  not 
shown). 
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Figure  5.  Water  treatment  simplified  the  degradation  fragment  peaks. 


EXPERIMENTS  TO  UNDERSTAND  THE  REACTION  MECHANISM. 

The  C-terminal  successive  degradation  was  carried  out  with  the  vapor  of  per¬ 
fluoroacyl  anhydrides  followed  by  treatment  of  aqueous  pyridine  vapor.  The  reaction  was 
carried  out  with  10%  PFPAA  at  -1 8°C  for  2h  on  ARGIKGIRGFSG. 

The  water  vapor  treatment  simplify  the  mass  spectrum  as  shown  in  Fig.  5,  where  the 
major  parts  of  -18  molecular  ions  were  moved  to  the  respective  C-terminal  truncated 
molecular  ions,  and  the  part  of  acylated  molecular  ions  were  subjected  to  deacylation.  The 
recoverable  -18  molecular  ions  may  be  due  to  the  formation  of  oxazolones  (or  mix  anhy¬ 
drides)  at  the  C-terminal  a-carboxyl  groups  while  the  latter  unstable  acylation  may  be 
O-acylation  of  the  oxazolone  and/or  hydroxy  group  of  Ser  or  Thr  residue. 

Evidence  that  the  intermediate  degradation  products  to  be  oxazolone  was  provided 
by  converting  them  to  the  corresponding  propyl  esters.  Successive  degradation  of  the 
octapeptide  His-Pro-Phe-His-Leu-Leu-Val-Tyr  with  a  vapor  of  15%  HFBAA-acetonitrile 
solution  at  -20°C  for  30  min  was  followed  by  esterification  of  the  degradation  products  with 
a  propanol  vapor  at  60°C  for  1 5  min.  The  products  were  analyzed  by  FAB  mass  spectrometry 
(Fig.  6).  This  results  show  that  the  main  degraded  products  were  acylated  intermediate 
compounds  such  as  oxazolones  which  easily  converted  to  their  propyl  esters. 

The  similar  type  of  experiment  was  also  carried  out  by  the  use  of  dimenthylhydrzaine 
where  the  dimethyl  hydrazides  were  observed  instead  of  the  propyl  esters  (Fig.  7), 
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Figure  6.  FAB  mass  spectrum  of  the  esterified  products  of  the  C-terminally  degraded  His-Pro-Phe-His-Leu- 
Leu-Val-Tyr  with  HFBAA  vapor.  The  peptide  (5  jig)  was  exposed  to  a  vapor  of  1 5%  HFBAA  acetonitrile 
solution  (100  ^il)  at  -20°C  for  30  min.  The  product  was  cooled  by  liquid  nitrogen,  dried  in  vacuo  and  reacted 
with  a  vapor  of  propanol  at  60°C  for  15  min.  The  reaction  product  was  dried  in  vacuo  and  analyzed  by  FAB 
mass  spectrometry.  In  this  figure  ‘prop’  stands  for  propylester. 


The  Other  possible  - 1 8  mass  ion  peak  may  be  caused  by  dehydration  of  serine  residue, 
acid  amide  residues  and  acidic  amino  acid  residues  converting  into  dehydroalanine,  acid 
nitriles  and  formation  of  5(6)-membered  rings,  respectively  (Fig.  8b).  These  conversions  are 
speculated  by  the  observations  of  appearances  and  disappearances  of  - 1 8  mass  ion  peaks  in 
the  course  of  C-terminal  successive  degradation  of  various  peptides  tested. 

The  accompanied  molecular  ions  were  not  only  -18  and  acylated  molecular  ions  but 
also,  often  seen,  -1  (Fig.  8a)  and  -46  (Fig.  8c)  molecular  ions.  A  set  of  the  following 
experiments  were  made  to  clarify  these  molecular  ions. 

A  peptide  Ala-Arg-Gly-Ile-Lys-Gly-Ile-Arg-Gly-Phe-Ser-Gly  (20  mg)  was  cleaved 
using  the  vapor  of  30%  PFPAA-acetonitrile  solution  in  the  vapor  phase  at  -20®C  for  Ih,  and 
was  exposed  to  the  vapor  of  aqueous  pyridine.  One  twentieth  of  the  reacted  peptide  was 
analyzed  by  FAB  mass  spectrometry  to  confirm  that  the  peptide  was  indeed  degraded 
(Fig.  9a).  The  rest  of  the  degraded  peptides  were  fractionated  by  HPLC  (Fig.  9b).  Isolated 
fractions  were  subjected  to  FAB  mass  spectrometry  and  identified  as  the  acylated  products 
corresponding  to  the  respective  sequential  degradation  products.  In  addition  to  the  acylated 
sequences,  -1  mass  peaks  were  also  found  (Table  4).  This  -1  mass  peak  is  due  to  the  cleaving 
at  the  amino  groups  resulting  in  the  acid  amides  (Fig.  8a)  but  not  to  the  cleaving  at  the  peptide 
bond.  This  cleavage  causes  the  discontinuation  of  the  further  successive  degradation. 

The  -46  peak  was  thought  to  be  due  to  decarboxylation  of  the  C-termini  of  the 
degradation  products.  This  was  tested  by  degrading  the  tetrapeptide  Met-Arg-Phe-Ala  under 
the  standard  C-terminal  degradation  conditions  and  fractionating  the  degradation  products 
by  HPLC.  Figure  10a  shows  280  nm-profile.  It  is  known  that  the  >.max  of  a  phenyl  group 
attached  to  an  aryl  group  is  280  nm,  whilst  that  of  a  phenyl  group  alone  is  254  nm  with  low 
extinction  coefficient.  Therefore  the  major  peak  in  chromatogram  at  280  nm  corresponds  to 
the  degraded  peptide  1-3,  acyl-Met-Arg-NH-CH  =  CH-C6H5,  (Fig.  8c).  FAB  mass  spec¬ 
trometry  of  this  peak  showed  a  molecular  ion  peak  of  553  (Fig.  10b)  which  corresponded  to 
the  calculated  mass  of  the  above  degraded  compound  (552.9).  Taken  together,  these  experi¬ 
ments  confirm  the  nature  of  the  -46  mass  peak. 
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Figure  7.  FAB  mass  spectra  of  degradation  product  derivatized  with  dimethylhydrazine  vapor.  Successive 
degradation  was  performed  with  30%  PFPAA  acetonitrile  solution  at  -20°C  for  30  min.  Degradation  product 
was  immediately  exposed  to  vapor  of  dimethylhydrazine. 
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Figure  8.  Possible  schemes  in  the  reaction  of  perfluoroacyl  anhydride  vapor  on  peptides. 


C-Terminal  Sequencing  Method  Using  Perfluoroacyl  Anhydrides  Vapor 


101 


Table  4.  FAB  mass  spectrometric  identification  of  the  HPLC 
separated  fragments  of  C-terminally  degraded  products. 


Peak  No. 

Observed  mass 

Sequence 

Calculated  mass 

1 

391.3 

l-2f 

391.1 

2 

449.2 

1-3 

499.2 

3 

392.2 

1-2 

392.1 

4 

1220.3 

1-10 

1220.6 

5 

746.3 

1-6^ 

746.4 

6 

689.3 

l-5t 

689.4 

7 

747.3 

1-6 

747.4 

1218.5 

1-12 

not  acylated  :  1218.7 

8 

690.4 

1-5 

690.4 

747.3 

1-6 

747.4 

9 

1072.1 

l_9t 

1072.6 

10 

561.2 

l^t 

561.2 

746.3 

1-6^ 

746.4 

1015.7 

1-8^ 

1015.5 

11 

1016.4 

1-8 

1016.5 

1073.5 

1-9 

1073.6 

12 

1016.4 

1-8 

1016.5 

1073.3 

1-9 

1073.6 

13 

1017.5 

1-8 

1016.5 

1074.5 

1-9 

1073.6 

14 

562.5 

1-4 

562.2 

859.5 

l_7t 

859.4 

15 

562.2 

1-4 

562.2 

16 

562.1 

1^ 

562.2 

17 

860.5 

1-7 

859.4 

1364.5 

1-12 

1364.7 

18 

860.5 

1-7 

859.4 

1308.0 

1-11 

1307.7 

19 

1219.8 

1-10+ 

1220.6 

20 

1219.7 

1-10+ 

1220.6 

The  truncated  degradation  products  were  separated  by  reverse 
phase  HPLC  (Fig.  9b).  Fractions  numbered  in  the  HPLC  profile 
were  identified  by  FAB  mass  spectrometry.  Degraded  product 
were  acylated  except  1-12  (Peak  7). 

■^indicates  the  -1  mass  products. 
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Figure  9.  FAB  mass  spectrum  and  HPLC  profile  of  the  C-terminally  degraded  dodecapeptide.  Dodecapeptide, 
Ala-Arg-Gly-Ile-Lys-Gly-Ile-Arg-Gly-Phe-Ser-Gly  (20  [ig)  was  degraded  with  30%  PFPAA  acetonitrile 
solution  at  -20°C  for  Ih  in  vapor  phase  followed  by  water  treatment.  After  reaction  an  aliquot  (1/20)  of  the 
product  was  analyzed  by  FAB  mass  spectrometry.  The  rest  of  the  product  was  separated  by  reverse-phase 
HPLC.  (a)  FAB  mass  spectrum  of  the  degraded  product  mixture;  (b)  Elution  profile  of  the  product  by 
reverse-phase  HPLC.  HPLC  was  made  under  the  following  conditions;  System:  600E  (Waters-Millipore, 
USA),  Column:  TSK-Gel  (4.6x250  mm  TOSOH,  Japan),  Flow  rate:  0.8  ml/min,  solvent:  TFA  and  0.1%  TFA 
in  80%  aqueous  acetonitrile,  Gradient  system:  a  linear  gradient  between  0  to  48%  acetonitrile  for  60  min.  The 
chromatographic  peaks  with  numbers  were  analyzed  by  FAB  mass  spectrometry.  The  results  were  summarized 
in  Table  4.  The  most  of  the  peaks  without  numbers  are  amino  acid  acyl  derivatives  from  the  truncated  C-termini. 
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Figure  10,  C-Terminal  successive  degradation  fragments  of  Met-Arg-Phe-Ala  were  analyzed  with  HPLC  and 
the  major  peak  at  280  nm  was  analyzed  by  mass  spectrometry.  C-Terminal  degradation  of  the  peptide  (20  pg) 
was  carried  out  with  a  vapor  of  30%  PFPAA  acetonitrile  solution  (100  pi)  for  Ih  at  -20°C.  HPLC  was  made 
with  the  SMART  HPLC  system  under  the  following  conditions.  Column,  pRPC  C2/C18  PC  3.2/3  (2.1  mm  x 
100  mm,  Pharmacia);  flow  rate,  0.2ml/min;  solvents,  0.1%  TFA  aqueous  solution  and  0.1%  TFA  acetonitrile 
solution.  A  linear  gradient  of  acetonitrile  concentration  (5-60%)  was  made  from  5  min  to  17.5  min  followed 
by  an  isocratic  elution  from  17.5  min  to  20  min.  The  chromatogram  was  monitored  at  both  wavelengths  215 
nm  (data  not  shown)  and  280  nm  (panel  a).  Peaks  were  fractionated  and  collected.  After  dried,  the  major  fraction 
marked  by  *  was  analyzed  by  FAB  mass  spectrum  shown  in  panel  b. 


APPLICATION  TO  BIG  PROTEINS 

It  is  not  easy  to  apply  this  method  to  protein  C-terminal  sequencing  because  usual 
protein  mass  are  too  big  to  analyze  by  FAB  mass  directly.  Electrospray  ionization  (ESI)  mass, 
has  been  successfully  applied  to  direct  measurement  of  protein  molecular  weights,  although 
an  analysis  of  a  sample  containing  a  mixture  of  similar  molecular  masses  like  truncated 
molecules  is  not  always  easy. 

The  following  three  strategies  were  tested  for  protein  C-terminal  sequencing  shown 
in  Fig.  11.  Three  strategies  are  illustrated  in  the  figure  as  follows;  (1)  First  fragment  the 
protein  and  the  C-terminal  fragment  is  isolated  by  somehow  specific  method  and  is  degraded 
with  perfluoroacyl  anhydride  vapor.  (2)  First  fragment  the  protein  and  the  fragmented 
mixture  is  degraded  with  the  anhydride  vapor,  using  the  specific  fragmentation  which  results 
in  the  inactive  non-C-terminal  fragments.  (3)  The  protein  is  first  degraded  with  the  anhydride 
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Figure  11.  Three  strategies  for  protein  C -terminal  sequencing. 


vapor  and  the  truncated  mixture  is  fragmented.  Then  the  truncated  C-terminal  fragments  are 
analyzed. 

According  to  the  strategy,  we  tried  to  fragment  a  protein  and  selectively  isolate  the 
C-terminal  peptide.  After  various  trials,  we  selected  the  classical  cyanogen  bromide  cleavage 
specific  for  methionyl  peptide  bond.  The  cleaved  peptide  fragments  were  selectively  frac¬ 
tionated  by  covalent  bond  formation  with  the  iV-(2-aminoethyl)-3-aminopropyl  glass  (APG, 
LKB  Biochrom  Ltd.)  into  non-C-terminal  fragments  and  the  C-terminal  peptide.  The 
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Figure  12,  FAB  mass  spectrum  of  truncated  C-terminal  peptide  of  cytochrome  c.  Five  nmoles  of  cytochrome 
c  (horse  heart)  were  cleaved  by  CNBr  at  room  temperature  for  16  h.  The  mixtures  were  evaporated  and  added 
anhydrous  TFA  for  30min.  The  product  solution  was  mixed  with  APG,  pre-equilibrated  with  a  2%  triethylamine 
DMF  solution  and  incubated  for  2  h  at  45®C.  The  products  were  dissolved  in  water  and  then  added  with  DMF. 
The  APG  was  washed  with  DMF  and  0. 1  M  pyridine  collidine  buffer  (pH  8.2).  The  unbound  and  washed 
fractions  were  collected  and  evaporated  in  a  small  test  tube.  The  degradation  was  performed  with  30%  PFPAA 
acetonitrile  solution  at  ~20°C  for  Ih  followed  by  aqueous  pyridine  vapor  treatment. 


C“terminal  peptide  was  successively  degraded  with  perfluoroacyl  anhydride  vapor  and 
analyzed  by  FAB  mass 

A  model  dodeca  spectrometry  peptide,  YGGFMRRVGRPE  was  cleaved  by  CNBr. 
The  resultant  N~terminal  pentapeptide  ending  homoserine  was  converted  to  homoserine 
lactone  with  TFA  at  20°C  for  30  min.  The  APG  supernatant  was  analyzed  by  FAB  mass.  The 
results  showed  the  complete  heptapeptide  C-terminal  sequence  (Nabuchi  et  al.  1994). 

Sheep  myoglobin  was  subjected  to  the  C-terminal  sequencing.  The  C-terminal 
peptide  (143-153)  was  isolated  by  APG  treatment  and  subjected  to  HFBAA  vapor.  The 
analysis  by  FAB  mass  spectrometry  showed  C-terminal  sequence  of  four  amino  acid  residues 
(Nabuchi  et  al.,  1994).  The  other  protein,  cytochrome  c  was  sequenced  after  isolation  the 
C-terminal  peptide  (81-104).  The  reaction  with  the  vapor  of  30%  PFPAA  acetonitrile  solution 
at  -20°C  for  Ih  resulted  in  the  C-terminal  seven  amino  acid  sequence  as  shown  in  Fig.  12. 

As  for  the  strategy  (2)  the  rice  Plastocyanin  (10  pg)  was  cleaved  with  CNBr  at  20°C 
for  72h  and  the  reaction  mixture  was  dried  and  treated  with  TFA  at  20°C  for  30  min.  The 
homoserine  lactones  of  the  C-termini  of  non-C-terminal  peptide  are  insensitive  for  the 
present  perfluoroacyl  anhydride  reaction.  Thus  without  purification  of  the  C-terminal 
peptide,  the  mixture  was  subjected  to  the  anhydride  reaction.  Even  with  high  background 
including  matrix  lines,  the  sequence  of  four  amino  acids  from  the  C-terminal  was  analyzed 
(Fig.  13). 

As  for  the  strategy  (3)  the  same  protein  was  directly  degraded  with  the  vapor  of  30% 
PFPAA  in  acetonitrile  at  -18°C  for  2h.  The  truncated  protein  was  analyzed  showing  the 
C-terminal  tetra  peptide  sequence  (Fig.  14).  Further  more  works  will  be  carried  out  for  the 
protein  C-terminal  sequencing. 
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Figure  13.  FAB  mass  spectrum  of  successively  degraded  C-terminal  peptide  of  Plastocyanin.  Rice  plastocy- 
anin,  which  contains  one  Met  at  90th  was  cleaved  with  CNBr.  The  peptides  dried  and  treated  with  TFA  at  20°C 
for  30  min.  The  reaction  mixture  was  directly  subjected  to  the  anhydride  reaction.  M  stands  for  matrix  line. 


Figure  14.  FAB  mass  spectrum  of  the  C-terminal  degradation  of  CNBr  fragments  of  plastocyanin.  Denatured 
plastocyanin  (10  pg)  was  treated  with  the  vapor  of  30%  PFPAA  acetonitrile  at  -18°C  for  2h.  After  aqueous 
pyridine  vapor  treatment ,  the  product  was  subjected  to  CNBr  cleavage.  M  stands  for  matrix  line. 
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INTRODUCTION 

In  1 992  we  reported  a  new  method  of  sequencing  proteins  from  the  carboxy-terminus 
(C-terminus)  (Boyd  et  al.,  1992).  In  the  past  2  years,  we  have  continued  our  investigations, 
including  the  mechanism  of  the  initial  activation  of  the  C-terminal  carboxyl  group  and  the 
deliberate  modifications  of  the  reactive  side-chains  of  the  amino  acids  with  the  sequencing 
reagents.  Through  the  selection  of  the  reagents  and  reaction  conditions  used  for  our 
sequencing  protocol,  aspartic  acid,  glutamic  acid,  serine  and  threonine  are  derivatized.  The 
amidation  of  aspartic  and  glutamic  acid,  and  the  acetylation  of  serine  and  threonine,  have 
led  to  improved  yields  in  sequencing  these  residues.  Aspartic  and  glutamic  acid  are  now 
categorized,  as  seen  in  Table  1 ,  as  amino  acid  residues  that  are  readily  sequenced.  Our  criteria 
for  determining  whether  a  residue  is  reliably  called  is  the  ability  to  sequence  through  and 
detect  that  residue  when  it  is  present  in  one  nanomole  of  a  protein  sample.  On  average,  it  is 
possible  to  sequence  5  cycles  on  one  nanomole  of  protein  applied  to  polyvinylidene 
difluoride  (PVDF)  membrane  if  the  amino  acid  sequence  contains  those  residues  listed  in 
the  “reliably  called”  column  of  Table  1.  Our  focus  for  the  1994  MPSA  conference  is  to 
illustrate  the  current  utility  of  this  C-terminal  sequencing  method  in  the  sequencing  of 
proteins  immobilized  onto  PVDF. 


SEQUENCING  METHOD 

Our  chemical  approach  for  sequencing  proteins  from  the  C-terminus  first  reported  2 
years  ago  is  presented  in  Scheme  1  (Boyd  et  al,  1992).  Similar  to  the  Schlack  and  Kumpf 


Methods  in  Protein  Structure  Analysis,  Edited  by  M.  Z.  Atassi  and  E.  Appella 
Plenum  Press,  New  York,  1 995 


109 


110 


V.  L.  Boyd  et  al. 


Table  1.  Alkylated  thiohydantoin  amino  acids 


Reliably  called 

In  development 

Stops  sequencing 

Alanine 
Arginine 
Asparagine 
Aspartic  acid 
Glutamic  acid 
Glutamine 

Histidine 

Isoleucine 

Leucine 

Lysine 

Methionine 

Glycine 

Phenylalanine 

Tryptophan 

Tyrosine 

Valine 

Cysteine 

Serine 

Threonine 

Proline 

method  (Schlack  and  Kumpf,  1926),  the  C-terminus  is  first  derivatized  into  a  thiohydantoin 
(TH).  In  the  Schlack  and  Kumpf  approach  the  amino  acid-TH  is  cleaved  and  the  truncated 
C-terminus  is  returned  to  a  carboxylic  acid.  A  unique  feature  of  our  sequencing  method 
(Scheme  1)  is  that  the  C-terminal  TH  is  alkylated.  Alkylation  results  in  an  alkylated 
thiohydantoin  (ATH)  that  is  more  readily  cleaved  from  the  C-terminus  of  the  protein  relative 
to  the  parent-TH.  The  ATH  is  cleaved  by  thiocyanate  anion  {NCS}'  under  acidic  conditions. 
An  important  advantage  of  our  method  is  that  while  {NCS}‘  cleaves  the  ATH,  the  amino 
acid  residue  adjacent  to  the  ATH  is  simultaneously  derivatized  into  a  TH.  The  efficient  and 
selective  cleavage  in  addition  to  simultaneous  derivatization  of  a  new  (n  -  1 )  proteinyl-TH 
bypasses  the  need  to  return  to  a  carboxylic  acid  at  the  C-terminus.  The  sequencing  method 
shown  in  Scheme  1  has  been  successful  in  sequencing  up  to  ten  cycles  on  protein  samples, 
noncovalently  attached  to  PVDF.  The  data  is  presented  herein. 
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Scheme  1.  The  applied  biosystems  alkylation  chemistry  for  automated  c-terminal  protein  sequence  analysis. 
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RESULTS  AND  DISCUSSION 

Using  nuclear  magnetic  resonance  (NMR)  spectroscopy,  we  observed  that  the  acti¬ 
vating  reagents  such  as  tetramethylchlorouronium  chloride  (Bozzini  et  al.,  1992  and  Boyd 
et  aL,  1992)  and  diphenylchlorophosphate  (Guga  et  al,  1993),  under  basic  conditions  using 
diisopropylethylamine  (DIEA),  converted  the  C-terminus  entirely  into  apeptidyl-oxazolone. 
Using  our  protocol,  the  oxazolone  is  reacted  with  {NCS}'  in  a  separate  step  under  acidic 
conditions  (TFA)  to  form  a  peptidyl-TH.  Acidic  conditions  reportedly  favor  the  cyclization 
into  am  (Inglis,  1991.) 

The  NMR  studies  also  revealed  that  the  oxazolone,  while  under  basic  conditions  will 
form  an  adduct  with  excess  activating  reagent.  (Boyd  et  al,  manuscript  in  preparation) 
Additionally,  basic  conditions  promote  diketopiperazine  formation  at  the  C-terminus.  Both 
of  these  side-reactions  of  the  oxazolone  retard  or  prevent  proteinyl-TH  formation.  Using  a 
weaker  base,  such  as  lutidine,  and  a  less  reactive  activating  reagent,  such  as  acetic  anhydride 
(AC2O),  the  oxazolone  side-reactions  are  suppressed. 

The  carboxylic  acid  side-chains  of  aspartic  and  glutamic  acid  residues  also  react  with 
AC2O,  forming  mixed-anhydrides.  Scheme  2  portrays  the  formation  of  an  oxazolone  at  the 
C-terminus,  and  the  formation  of  a  mixed-anhydride  at  a  glutamic  acid  side-chain.  The  NH3 
formed  from  the  dissociation  of  ammonium  thiocyanate  (NH4SCN)  was  observed  to  react 
with  the  mixed  anhydride,  but  not  with  the  ionized  oxazolone  at  the  C-terminus  while 
conditions  were  still  basic.  The  resulting  amidations  of  the  aspartic  and  glutamic  acid 


Scheme  2.  Activation  of  side-chain  carboxylic  acid  vs.  C-terminus. 
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side-chains  form  asparagine  and  glutamine.  If  piperidine  thiocyanate  is  used  in  place  of 
NH4SCN,  aspartic  and  glutamic  acid  are  converted  to  the  piperidine  amides.  To  avoid  the 
possibility  of  the  amidation  at  the  C-terminus,  tetrabutylammoniumthiocyanate  (NBU4NCS) 
is  used  as  a  source  of  {NCS}'  for  all  steps  in  our  sequencing  protocol  where  proteinyl-TH 
is  formed.  Amidation  of  glutamic  and  aspartic  acid  residues  is  carried  out  preferably  after 
proteinyl-TH  formation. 

Acetic  anhydride  will  also  acetylate  the  hydroxyl  groups  of  serine  and  threonine, 
the  phenol  group  of  tyrosine,  and  the  epsilon  amine  group  of  lysine.  Except  when  serine 
or  threonine  are  located  at  the  C-terminus,  the  hydroxyl  groups  of  serine  and  threonine 
interfere  with  the  present  sequencing  method  (Boyd  et  al,  1992).  Acetylation  of  the 
hydroxyl  group  prevents  displacement  of  the  alkylated  sulfur  atom  of  an  adjacent  ATH 
residue  during  sequencing.  Therefore,  “capping”  the  serine  and  threonine  residues  elimi¬ 
nates  the  interfering  side-reaction.  At  present,  DIEA  is  used  with  AC2O  for  the  deliberate 
acetylation  of  the  hydroxyl  groups.  Typically,  a  reduced  yield  is  observed  in  cycles 
following  a  serine  or  threonine.  The  ATH  derivative  for  serine  and  threonine,  if  detected, 
corresponds  to  the  dehydro-analog.  Whether  the  presence  of  dehydrated  serine  and 
threonine  in  a  protein  prior  to  sequencing  interferes  with  the  sequencing  method  has  not 
yet  been  determined.  A  protein  with  multiple  serine  or  threonine  residues  near  the 
C-terminus  remains  difficult  to  sequence. 

Acetylation  of  the  epsilon  amine  group  of  lysine  results  in  an  ATH  derivative  that 
co-elutes  with  an  artifact  peak  in  our  current  chromatography  system.  At  present,  the  lysine 
residues  are  derivatized  into  phenylureas  prior  to  sequencing  with  phenylisocyanate  (PIC). 
Reproducible  HPLC  peaks  are  observed  for  tyrosine  and  arginine  residues  when  acetic 
anhydride  is  used  for  the  initial  activation.  The  independent  synthesis  of  the  ATH  reference 
standards  for  acetylated  tyrosine  and  arginine  is  in  progress. 

Figure  1  illustrates  the  aspartic  acid  residue  in  cycle  3  of  enolase  (....G-D-K-F) 
amidated  to  the  piperidine  amide  during  the  initial  cycle  of  our  sequencing  protocol.  The 
ATH  derivative  for  the  amidated  aspartic  acid  residue  is  clearly  identified  during  sequencing. 
Our  amidation  procedure  has  been  consistently  successful  on  all  proteins  containing  aspartic 
and  glutamic  acid  sequenced  to  date. 

Horse  heart  Cytochrome  C  (....L-K-K-A-T-N-E,  Figure  2)  has  a  C-terminal 
glutamic  acid  and  a  threonine  in  cycle  3.  The  glutamic  acid  is  observed  as  both  the 
piperidine  amide  and  the  free  acid  due  to  incomplete  amidation.  The  hydroxyl  group 
of  threonine  is  acetylated  during  the  initial  cycle  of  sequencing.  A  C-terminal  aspartic 
acid  or  glutamic  acid  residue  can  interfere  with  initial  proteinyl-TH  formation,  and 
likely  reduces  the  initial  yield  in  this  example  (Stark,  1968).  As  described  above,  the 
ATH  of  threonine  forms  the  dehydro-analog  and  often  is  not  detected.  However,  in 
this  example  of  Cytochrome  C  (2  nmol,  PVDF)  the  amino  acid  sequence  could  be 
determined  for  7  cycles. 

RNase  (....F-D-A-S-V,  Figure  3)  has  both  a  serine  and  an  aspartic  acid  residue.  Prior 
to  the  recent  improvements  in  our  sequencing  protocol,  it  was  not  possible  for  us  to  sequence 
this  protein.  After  acetylation  of  the  hydroxyl  group  of  serine  and  amidation  of  the  aspartic 
acid  into  a  piperidine  amide,  the  sequence  could  be  determined  for  five  cycles.  The  decline 
in  yield  in  sequencing  through  a  serine  residue  occurs  as  described  above,  but  does  not 
terminate  the  sequencing. 

The  final  two  protein  sequencing  examples  are  included  to  illustrate  the  improved 
sequencing  performance  relative  to  our  publication  2  years  ago. 

One  nanomole  of  beta-lactoglobulin  (BLG)  was  applied  to  a  PVDF  membrane  and 
was  sequenced  for  7  cycles.  The  fifth  and  sixth  residues  from  the  C-terminus  of  BLG  are 
glutamic  acid  residues  which  are  both  detected  as  the  piperidine  amides  in  Figure  4.  The 
ATH  of  histidine  in  cycle  2  is  also  clearly  seen  in  this  protein.  The  cystine  in  cycle  3  was 
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15.0  18.0  21.0  24.0  27.0  30.0  33.0 


Figure  1.  C-terminal  sequence  analysis  of2nmol  of  yeast  enolase  applied  to  PVDF.  The  lysine  ATH  residue 
in  cycle  2  is  the  phenylurea  derivative  and  the  aspartic  acid  in  cycle  3  is  the  piperidine  amide  derivative. 
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8.0  12.0  16.0  20.0  24.0  28.0  32.0  8.0  12.0  16.0  20.0  24.0  28.0  32.0 

Figure  2.  C-terminal  sequence  analysis  of  2  nmol  of  Horse  Heart  CytochromeC,  aaplied  to  PVDF. 
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Figure  5.  C-terminal  sequence  analysis  of  Horse  Apomyoglobin,  500  pmol  applied  to  PVDF. 


not  reduced  prior  to  sequencing,  so  no  signal  was  detected.  When  cysteine  is  present  in  a 
protein,  HPLC  peaks  corresponding  to  both  dehydroalanine  and  the  S-alkylated  ATH 
derivative  have  been  detected  (Guga  et  aL,  1993). 

The  C-terminal  sequence  data  for  Apomyoglobin  (500  pmole)  is  presented  in  Fig¬ 
ure  5.  The  amino  acid  sequence  could  be  determined  for  ten  cycles. 
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SUMMARY 

At  the  seventh  symposium  of  the  Protein  Society  (July,  1994),  we  demonstrated  the 
utility  of  our  sequencing  method  for  the  characterization  of  genetically  engineered  proteins 
including  the  sequencing  of  samples  electrob lotted  onto  PVDF  (Bozzini  etal,  1994).  In  this 
article,  we  demonstrate  the  improvements  in  our  ability  to  sequence  amino  acids  with 
reactive  side-chain  groups,  and  in  sequencing  protein  samples  at  higher  sensitivity. 

Acetic  anhydride,  a  reagent  with  a  long  history  of  use  for  the  activation  of  carboxyl 
groups,  as  well  as  for  the  acetylation  of  serine,  threonine,  lysine,  and  tyrosine,  has  been 
integrated  into  our  sequencing  protocol.  The  suitability  of  acetic  anhydride  for  protein 
modifications  including  proteinyl-TH  formation  is  well  documented.  Acetylation  of  the 
serine  and  threonine  hydroxyl  groups  has  enabled  sequencing  through  these  residues  in  some 
of  the  proteins  we  have  sequenced  to  date.  Acetylation  of  the  epsilon  amine  group  of  lysine 
will  make  phenylisocyanate  pretreatment  unnecessary.  The  resolution  of  acetylated  lysine 
ATH  in  the  HPLC  separation  system  is  currently  being  optimized.  The  use  of  single  reagent, 
acetic  anhydride,  for  the  activation  of  the  carboxylic  acid  groups  and  for  advantageous 
acetylation  of  reactive  side-chain  residues,  reduces  the  number  of  reagents  to  which  the 
protein  is  exposed. 

The  500  pmol,  1 0  cycle  sequencing  example  of  apomyoglobin  illustrates  our  progress 
in  the  development  of  our  sequencing  protocol.  Our  continued  developement  towards 
sequencing  through  all  of  the  amino  acids  residues,  and  the  enhancement  in  sensitivity,  will 
expand  the  utility  of  this  C-terminal  sequencing  method. 
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INTRODUCTION 

An  automated  carboxy-terminal  (C-terminal)  protein  sequencing  technology  devel¬ 
oped  by  Hewlett-Packard  enables  the  direct  confirmation  of  the  C-terminal  sequence  of 
native  and  expressed  proteins,  the  detection  and  characterization  of  protein  processing  at  the 
C-terminus,  the  identification  of  post- translational  proteolytic  cleavages,  and  partial  se¬ 
quence  information  on  amino-terminally  blocked  protein  samples.  The  approach  offers 
sequence  analysis  through  each  of  the  20  common  amino  acid  residues  including  proline, 
which  has  historically  been  highly  problematic.  Additionally,  the  scope  of  typically  analyz- 
able  protein  samples  spans  a  usefully  broad  molecular  weight  range  and  degree  of  structural 
complexity. 

The  automated  sequencing  chemistry  of  the  HP  G1 009A  C-terminal  protein  sequenc¬ 
ing  system  utilizes  diphenyl  phosphoro-isothiocyanatidate  (DPPITC)  as  the  coupling  reagent 
and  trimethylsilanolate  (KOTMS)  as  the  cleavage  reagent  for  the  efficient  generation  of 
thiohydantoin-amino  acid  derivatives  (TH-aa)  (Bailey,  J.  M.  et  al,  1990,  1992)).  The 
automated  HPLC  analyses  of  the  sequencing  cycles  is  accomplished  using  the  HP  1 090M 
liquid  chromatograph  with  the  HP  specialty  PTH  analytical  HPLC  column  (HP  technical 
note,  1994). 


MATERIALS  AND  METHODS 

Protein  samples  were  applied  to  precut  Zitex  membranes  and  inserted  into  C-terminal 
sequencer  columns  for  sequence  analysis  using  the  HP  G1 009A  C-terminal  protein  sequenc¬ 
ing  system  (H-P,  Palo  Alto,  CA).  The  system  consists  of  the  HP  GIOOOA  C-terminal  protein 
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sequencer,  HP  1090M  liquid  chromatograph,  HP  Vectra  486/66  computer  with  Microsoft® 
MS-DOS®  and  Windows'^'^  environment,  and  HP  specialty  C-terminal  sequencing  reagents, 
solvents,  HPLC  columns  and  solvents,  and  all  related  consumables. 

Protein  samples  were  obtained  from  Sigma  Chemical  Co.  (St. Louis, MO).  Thiohy- 
dantoin-amino  acid  derivatives  were  prepared  for  and  quality-controlled  by  HP  (3).  The 
C-terminal  chemical  sequencing  method  was  developed  for  automation  using  a  chemistry 
licensed  to  HP  from  the  City  of  Hope,  Duarte,  CA. 


RESULTS  AND  DISCUSSION 
Sequencing  Criteria 

The  principal  requirement  for  any  chemical  sequencing  methodology  is  to  enable 
efficient,  rapid,  and  reproducible  reactions  that  can  be  applied  to  all  20  common  amino  acid 
residues.  This  applies  in  particular  to  proline  which  has  historically  challenged  all  sequencing 
degradative  reaction  schemes  resulting  in  no  identifiable  derivative  and  preventing  sequenc¬ 
ing  chemistry.  The  chemical  cleavage  of  the  cyclized  peptidylthiohydantoin  must  adequately 
yield  the  thiohydantoin-amino  acid  derivative  as  well  as  the  shortened  polypeptide,  chemi¬ 
cally  suitable  for  the  succeeding  cycle  of  sequencing  chemistry.  The  criteria  for  reliable  and 
reproducible  analyses  are,  in  part,  satisfied  by  the  use  of  an  inert  reaction  support  that  readily 
immobilizes  the  sample  without  any  intervening  covalent  attachment  protocols  or  pre-se¬ 
quencing  procedures  that  impart  irreproducible  and  unpredi eatable  sample  losses  and 
variable  yields.  This  sequencing  methodology  utilizes  a  Zitex  membrane  (a  porous  Teflon 
membrane)  as  a  non-covalent  reaction  support.  Protein  samples  are  applied  directly  to  the 
reaction  membrane  and  are  adsorptively  immobilized  for  the  chemical  sequencing.  Addi¬ 
tionally,  the  protein  sequence  analysis  relies  on  the  implementation  of  a  stable  and  repro¬ 
ducible  chromatographic  method  for  the  analysis  of  the  thiohydantoin-amino  acid 
derivatives. 

Sample  Application  on  Zitex  Membranes 

The  protein  samples  for  C-terminal  sequencing  are  conveniently  applied  directly  to 
Zitex  reaction  membranes  (1mm  x  12mm)  that  have  been  pre-treated  with  alcohol  (iso¬ 
propanol).  The  liquid  sample  solutions,  on  occasion  made  homogeneous  by  the  addition  of 
a  small  volume  (l-5ul)  of  dilute  aqueous  trifluoroacetic  acid,  are  applied  to  the  wetted  Zitex 
membrane  in  5-20ul  volumes  and  allowed  to  dry  either  at  room  temperature  (10-20  minutes) 
or  at  a  controlled  elevated  temperature. 

The  dry  membrane  is  inserted  into  a  C-terminal  sequencer  column  (inert  Kel-F 
columns  fitted  with  inert  endfrits)  and  installed  in  any  one  of  the  four  sample  positions  on 
the  HP  G 1009 A  sequencer.  The  sequencer  column  reactions  that  occur  on  the  Zitex  mem¬ 
brane  include  the  chemical  coupling  and  cyclization  of  the  C-terminal  residue  and  the 
cleavage  and  extraction  that  releases  the  thiohydantoin-amino  acid  derivative.  The  thiohy¬ 
dantoin-amino  acids  are  extracted  off  the  Zitex  membrane  from  the  sequencer  column  to  the 
sequencer  transfer  flask  for  preparation  for  HPLC  injection. 

The  sample  application  is  compatible  with  diverse  samples  recovered  in  various 
buffers  (phosphate,  inorganic  salts)  and  solvents  (HPLC  fractions).  Samples  that  have  been 
subjected  to  amino-terminal  sequence  analysis  using  the  HP  G1005A  N-terminal  protein 
sequencing  system  and  Zitex  as  a  reaction  support  may  be  transferred  to  C-terminal 
sequencer  columns  and  subjected  to  C-terminal  sequence  analysis  with  the  HP  G1009A 
C-terminal  sequencing  system. 
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C-Terminal  Coupling  and  Cyclization  Reactions 

Diphenyl  phosphoroisothiocyanatidate  (DPPITC)  in  the  presence  of  pyridine  consti¬ 
tutes  the  new  chemical  coupling  and  cyclization  reactions  for  the  HP  G 1009 A  automated 
C-terminal  sequence  analysis  (1).  Prerequisite  to  the  coupling  reaction  with  DPPITC  is  the 
base  activation  of  the  protein  C-terminal  carboxylic  acid  moiety  to  a  carboxylate  species. 
The  membrane  adsorbed  protein  sample  is  treated  with  a  suitable  base  such  as  diiso- 
propylethylamine  or  trimethylsilanolate.  The  carboxylate  salt  of  the  C-terminal  amino  acid 
residue  is  highly  reactive  to  the  diphenyl  phophoro-isothiocyanatidate  coupling  reagent, 
speculatively  generating  a  reactive  pentavalent  species  which  collapses  to  the  C-terminal 
acylisothiocyanate.  The  Zitex  membrane  is  washed  with  organic  solvent  (acetonitrile)  to 
eliminate  the  excess  DPPITC. 

The  coupled  peptidylacylisothiocyanate  is  treated  with  pyridine  to  induce  the  five- 
membered  ring  closure  to  the  carboxy-terminal  peptidylthiohydantoin.  The  reaction  of 
pyridine  has  been  found  to  efficiently  promote  the  cyclization  of  the  acylisothiocyanate  to 
the  acylthiohydantoin  product.  The  effectiveness  of  pyridine  in  this  reaction  can  be  assigned 
to  its  nucleophilic  properties  in  addition  to  its  basicity.  The  coupled  and  cyclized  membrane- 
bound  peptidylthiohydantoin  is  washed  with  organic  solvent  to  remove  the  excess  pyridine 
and  resultant  reaction  by-products. 

Additional  treatment  of  the  peptidylacylisothiocyanate  with  liquid  anhydrous  tri- 
fluoroacetic  acid  enables  the  cyclization  of  the  C-terminal  prolylisothiocyanate  to  the 
corresponding  prolylthiohydantoin  (Bailey,  J.  M.,  et  al  1995).  This  species  is  readily  cleaved 
by  treatment  with  2%  aqueous  trifluoroacetic  acid  vapor  (and  trimethylsilanolate  treatment) 
to  yield  the  thiohydantoin-proline  derivative.  A  methanolic  extraction  of  the  thiohydantoin- 
proline  residue  from  the  reaction  column  to  the  transfer  flask  is  subsequently  followed  by 
the  column  cleavage  reaction  and  solvent  extraction  to  conclude  the  essentials  of  the 
chemical  degradative  cycle. 

Cleavage  Reaction  of  the  Peptidylthiohydantoin 

The  peptidylthiohydantoin  (coupled  and  cyclized  product)  is  subjected  to  chemical 
cleavage  to  the  C-terminal  thiohydantoin-amino  acid  residue  and  the  shortened  polypep¬ 
tide  using  an  alkali  salt  of  trimethylsilanolate  (KOTMS).  The  cleavage  reagent  is  a  highly 
reactive  nucleophile  displacing  the  thiohydantoin  derivative  from  the  C-terminal  acylthio¬ 
hydantoin  moiety.  The  resulting  trimethylsilyl  ester  is  rapidly  cleaved  to  a  free  C-terminal 
carboxylate  species  amenable  for  the  next  cycle  of  chemical  coupling  with  DPP-ITC. 
The  released  thiohydantoin-amino  acid  is  extracted  off  the  Zitex  membrane  to  the  transfer 
flask  as  the  cleavage  solution  and  subsequent  organic  solvent  (acetonitrile).  The  extraction 
solvents  are  evaporated  and  the  thiohydantoin-amino  acid  is  redissolved  in  the  HPLC 
transfer  solvent  (dilute  aqueous  trifluoroacetic  acid)  and  injected  into  the  HP  1090M 
HPLC  for  analysis. 

HPLC  Analysis  of  Thiohydantoin-Amino  Acid  Derivatives 

The  HP  G 1009 A  C-terminal  protein  sequencing  system  provides  automated  HPLC 
analysis  of  sequencer  cycles  using  the  HP  1 090M  liquid  chromatograph  with  filter  photomet¬ 
ric  detection  at  269nm  and  the  HP  specialty  (2. 1mm  x  25cm)  reversed-phase  PTH  analytical 
HPLC  column  (3).  A  39-minute  binary  gradient  (Solvent  A:  phosphate  buffer,  pH  2.9; 
Solvent  B:  acetontrile/water)  utilizes  an  ion-pairing  reagent  (alkyl  sulfonate)  enabling  highly 
reproducible  elution  times  and  peak  resolution.  A  stable  thiohydantoin-amino  acid  standard 
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Figure  1.  Thiohydantoin-amino  acid  standard. 


mixture  is  incorporated  on  the  sequencer  for  on-line  automated  peak  calibration  and 
quantitation. 

The  thiohydantoin-amino  acid  standard  mixture  (TH-Std)  consists  of  the  synthetic 
thiohydantoin  derivatives  corresponding  to  the  actual  sequencing  products  resulting  from 
chemical  sequence  analysis.  In  particular,  the  serine,  threonine,  cysteine,  and  lysine 
thiohydantoin  derivative  standards  correspond  to  their  respective  sequencing  degradation 
products.  The  sequencing  product  derivatives  of  serine  and  cysteine  yield  the  same 
degradation  species  and,  without  cysteine  side  chain  modification,  permit  the  identifica¬ 
tion  of  either  residue  for  confirmatory  analysis  of  a  known  sequence.  The  residue 
assignment  of  cysteine  for  unknown  sequences  requires  the  prior  chemical  modification 
of  cysteine  (an  S-alkylation)  as  is  routinely  done  with  amino-terminal  sequencing  meth¬ 
ods. 


time  (min) 

Figure  2.  Chemical  background  of  3  blank  cycles. 
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Figure  3.  Cycle- 1  of  mouse  immunoglobulin  G  samples. 


The  thiohydantoin-amino  acid  standard  HPLC  chromatogram  (Figure  1)  shows  the 
elution  times  for  each  of  the  20  common  amino  acid  derivatives  including  thiohydantoin-Pro 
(P)  and  the  common  peak  designated  S/C,  identifying  Ser  and  Cys  residues.  The  relative 
retention  time  for  the  S-carboxymethyl  derivative  of  cysteine  is  indicated  by  the  arrow.  The 
peak  identified  for  Lys  (K)  corresponds  to  the  free-epsilon  amino  derivative  of  thiohydan- 
toin-lysine.  Thiohydantoin-Ile  (I)  chromatographically  resolves  into  two  components  repre¬ 
senting  the  structural  isoforms  for  the  amino  acid  residue;  the  He  peaks  do  not  coelute  with 
any  other  thiohydantoin-amino  acid  derivative  or  chemical  background.  The  thiohydantoin 
standard  chromatogram  shown  represents  a  50  pmol  mixture  of  the  standard  thiohydantoin 
derivatives  except  for  the  Ser,  Cys,  Thr,  and  Lys  derivatives  that  represent  approximately 
100  pmol  amounts. 

C-Terminal  Sequence  Analysis  of  Diverse  Samples 

The  chemical  sequencing  background  generated  on  the  Zitex  reaction  membrane, 
resulting  from  the  reaction  by-products  of  the  coupling  reagent  (DPPITC)  appears  chro¬ 
matographically  as  three  principal  and  two  minor  UV  absorbing  components  (Figure  2). 
These  background  related  species  do  not  coelute  with  any  of  the  20  common  thiohydan¬ 
toin-amino  acid  derivatives  and  thus  do  not  interfere  with  sequencer  cycle  residue 
assignments. 

The  recoveries  of  first-cycle  residues  typically  result  in  sequencing  initial  yields 
ranging  from  10%-50%  of  the  total  amount  of  sample  applied  to  the  Zitex  membrane.  As 
observed  for  amino-terminal  sequencing,  there  is  a  sample  dependency  (and  residue  depend¬ 
ency)  that  contributes  to  the  initial  thiohydantoin-amino  acid  recoveries. 

Mouse  immunoglobulin  G  (150kDa)  was  applied  as  a  phosphate  buffer  solution 
directly  on  a  Zitex  reaction  membrane  and  subjected  to  C-terminal  sequence  analysis  (Figure 
3).  The  C-terminal  cycle  identified  the  extent  of  protein  processing  of  the  C-terminal  heavy 
chains  by  the  detection  and  quantitation  of  the  heavy  chain  Lys  (K,  expected  full-length 
sequence  C-terminal  residue)  and  Gly  (G)  residues.  The  C-terminal  residue  of  the  light  chain 
was  identified  as  the  expected  Cys  (C)  residue.  The  arrows  indicate  the  sequencing  chemical 
background.  The  results  of  sequence  analysis  of  a  900  pmol  and  450  pmol  sample  of  the 
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Figure  4.  Cycle- 1  of  mouse  immunoglobulin  G  samples  (top:  900  pmols;  bottom:  450  pmols). 


mouse  IgG  show  the  linearity  in  signal  response  and  the  unambiguous  residue  assignments 
at  each  of  these  sample  amounts  (Figure  4). 

Superoxide  dismutase,  an  N-terminally  blocked  protein,  was  applied  to  a  Zitex 
reaction  membrane  in  the  amount  of  approximately  1  nmol  in  lOul  of  1%  aqueous  tri- 
fluoroacetic  acid.  The  first  three  cycles  of  C-terminal  sequence  analysis  results  in  the 
identification  of  Lys  (K)  cycle-1 ,  Ala  (A)  cycle-2,  and  He  (I)  cycle-3  (Figure  5).  The  chemical 
background  remains  relatively  stable  as  a  thiohydantoin  background  increases  in  part 
attributed  to  internal  cleavages  as  analogously  observed  for  amino -terminal  sequencing 
chemistry. 

The  results  of  C-terminal  sequence  analysis  of  a  chromatographic  ally  isolated 
hemoglobin  B  chain  is  shown  for  the  first  three  sequencer  cycles  (Figure  6).  Approximately 
1  nmole  of  protein  sample,  recovered  from  reversed-phase  HPLC  separation  of  the  hemo¬ 
globin  A  and  B  chains,  was  directly  applied  to  a  Zitex  reaction  membrane  and  immediately 
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Figure  5.  Cycles  of  superoxide  dismutase. 
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Figure  6.  Cycles  1-3  of  hemoglobin  B  chain. 


subjected  to  C-terminal  sequence  analysis.  The  results  show  the  unambiguous  identification 
of  the  first  two  cycles,  His-1  and  Tyr-2,  and  the  confirmatory  identification  of  the  third  cycle 
residue,  Arg-3.  Sequence  analysis  of  the  hemoglobin  A  chain,  isolated  chromatographically, 
resulted  in  unambiguous  residue  assignments  of  Arg  (R)  cycle- 1  and  Tyr  (Y)  at  cycle-2 
(Figure  7). 

The  results  of  C-terminal  sequence  analysis  on  a  series  of  proline-containing  protein 
samples  shows  the  identification  of  thiohydantoin-Pro  (P)  at  each  cycle- 1  of  the  analyses 
(Figure  8)  confirming  the  expected  full-length  sequences.  The  model  polypeptide, 
polyproline,  was  applied  to  a  Zitex  reaction  membrane  and  directly  sequenced  as  were  the 
additional  two  protein  systems  shown.  Ovalbumin  and  apo-transferrin  were  subjected  to 
sequence  analysis  and  resulted  in  the  recovery  of  thiohydantoin-Pro  as  first  cycle  sequencer 
residues.  The  sequencing  chemistry  invokes  the  use  of  neat  trifluoroacetic  acid  (as  described 
here  and  elsewhere,  cf  ref  3)  to  generate  the  cleavable  thiohydantoin-Pro  derivative  which 
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Figure  7.  Cycles  1  and  2  of  hemoglobin  A  chain. 
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Figure  8.  Cycle- 1  of  proline-containing  proteins. 


is  released  upon  treatment  with  aqueous  acidic  vapor  and  the  use  of  trimethylsilanolate.  The 
neat  trifluoroacetic  acid  induced  cyclized  prolylthiohydantoin  is  cleaved  with  aqueous 
trifluoroacetic  acid  vapor,  the  thiohydantoin-Pro  derivative  is  extracted  with  methanol  to  the 
transfer  flask,  and  the  reaction  membrane  treated  with  trimethylsilanolate  (as  part  of  the 
routine  cleavage  steps  of  the  chemistry  cycle)  that  cleaves  any  residual  thiohydantoin-Pro 
from  the  cyclized  peptidylprolylthiohydantoin  species. 

The  results  of  C-terminal  sequence  analysis  on  a  1  nmol  sample  of  bovine  beta-lac- 
toglobulin  A  are  shown  in  Figure  9.  The  first  three  cycles  of  analysis  show  the  identification 
of  cycle- 1  He  (I),  cycle-2  His  (H),  and  cycle-3  Cys  (C)  confirming  the  expected  full-length 
protein  sequence. 

The  results  of  C-terminal  sequence  analysis  on  a  variety  of  low  molecular 
weight  protein  samples  are  shown  as  cycle- 1  chromatograms  (Figure  10).  Bovine 
insulin  A  and  B  chains  (combined  mol  wt  5.8  kDa)  yielded  Asn  and  Ala  respectively 


Figure  9.  Cycles  1-3  of  bovine  beta-lactoglobulin  A. 
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Figure  10.  Cycle- 1  of  low  molecular  weight  proteins. 


and  the  13.7  kDa  protein,  bovine  ribonuclease  B  resulted  in  the  unambiguous  identi¬ 
fication  of  Val  at  cycle- 1, 

Sequence  analysis  of  an  approximate  1  nmol  sample  of  human  serum  albumin 
(68kDa)  resulted  in  the  identification  of  cycles  1-3  and  are  assigned  as  Leu  (L)  cycle- 1 ,  Gly 
(G)  cycle-2,  and  Leu  (L)  cycle-3  (Figure  11). 

A 1  nmol  sample  of  human  serum  albumin  was  applied  to  a  Zitex  reaction  membrane, 
inserted  into  an  N-terminal  sequencer  membrane-compatible  column,  and  installed  in  the 
HP  G1 005 A  N-terminal  protein  sequencer  (Miller,  C.  G.,  1995).  The  sample  was  subjected 
to  five  cycles  of  automated  N-terminal  sequence  analysis  (cycles- 1  Asp,D  and  -2  Ala,  A  are 
shown)  and  the  Zitex  reaction  membrane  transferred  to  the  HP  G 1009 A  C-terminal  protein 
sequencing  system  (Figure  12).  The  first  two  cycles  of  automated  C-terminal  sequence 
analysis  of  the  previously  N-terminal  sequenced  sample  resulted  in  the  unambiguous 


time  (min) 

Figure  11.  Cycle  1-3  of  human  serum  albumin. 
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Figure  12.  Human  serum  albumin — 1  nmol  on  Zitex.  (5  cycles  N-terminal  and  2  cycles  C-terminal.) 


C-terminal  sequence  residue  assignments  of  Leu  (L)  at  cycle- 1  and  Gly  (G)  at  cycle-2.  This 
example  of  the  integration  of  amino -terminal  and  carboxy-terminal  sequence  analysis  on  a 
single  sample  should  become  an  invaluable  procedure  for  the  sequence  determination  and 
structural  identification  of  protein  samples. 


CONCLUSIONS 

The  HP  G1009A  C-terminal  sequencing  system  automates  an  efficient,  reliable, 
and  reproducible  carboxy-terminal  sequencing  chemistry  based  on  the  introduction  of 
diphenyl  phosphoro-isothiocyanatidate  as  the  coupling  reagent,  pyridine  as  a  cyclization 
reagent,  and  trimethylsilanolate  as  the  cleavage  reagent.  The  strategic  incorporation  of 
trifluoroacetic  acid  into  an  extended  cyclization  scheme  enables  the  sequence  analysis 
through  all  of  the  20  common  amino  acids,  including  proline.  The  sequencing  method¬ 
ology  does  not  require  any  pre-sequencing  modifications  to  protect  side  chain  residues 
or  covalent  attachment  protocols  to  retain  protein  samples  to  the  reaction  support.  The 
use  of  Zitex  as  a  reaction  membrane  enables  the  adsorptive  immobilization  of  protein 
samples  by  a  facile  direct  sample  application  to  the  alcohol-treated  membrane.  A  robust 
HPLC  method  for  the  identification  of  the  thiohydantoin-amino  acid  derivatives,  in 
addition  to  an  on-line  thiohydantoin-amino  acid  standard  mixture,  enables  the  reliable 
detection  and  identification  of  sequencer  cycle  residues.  The  sequencing  system  provides 
the  flexible  platform  on  which  further  developments  and  refinements  to  the  chemical 
methodology  can  be  accomplished.  It  is  anticipated  that  the  exceedingly  high  chemical 
reaction  efficiencies  obtained  for  amino-terminal  sequence  analysis  (HP  G 1005 A)  will 
be  approached  by  continued  advancements  in  carboxy-terminal  sequence  analysis.  The 
immediate  needs  and  requirements  for  the  identification  of  C-terminal  sequence  and  the 
detection  and  quantitation  of  protein-processing  at  the  C-terminus  of  native  and  expressed 
proteins  and  protein  products  are  achievable  using  the  current  HP  G 1009 A  sequencing 
technology. 
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INTRODUCTION 

There  has  been  much  interest  in  the  development  of  a  chemical  method  for  the 
sequential  C-terminal  sequence  analysis  of  proteins  and  peptides.  Such  a  method  would  be 
analogous  and  complimentary  to  the  Edman  degradation  commonly  used  for  N-terminal 
sequence  analysis  (Edman,  1950).  It  would  also  be  invaluable  for  the  sequence  analysis  of 
proteins  with  naturally  occurring  N-terminal  blocking  groups,  for  the  detection  of  post-trans¬ 
lational  processing  at  the  carboxy-terminus  of  expressed  gene  products,  and  for  assistance 
in  the  design  of  oligonucleotide  probes  for  gene  cloning.  Although  a  number  of  methods 
have  been  described,  the  method  known  as  the  “thiocyanate  method”  (Schlack  and  Kumpf, 
1926),  has  been  the  most  widely  studied  and  appears  to  offer  the  most  promise  due  to  its 
similarity  to  current  methods  of  N-terminal  sequence  analysis. 

The  thiocyanate  method  involves  the  reaction  of  a  protein  with  an  isothiocyanate 
reagent,  in  the  presence  of  a  carboxylic  acid  activating  reagent,  to  form  a  C-terminal 
peptidylthiohydantoin.  The  C-terminal  amino  acid,  derivatized  as  a  thiohydantoin,  is  then 
specifically  removed  to  yield  a  shortened  peptide  or  protein  and  a  thiohydantoin  amino  acid. 

Many  of  the  problems  associated  with  the  thiocyanate  chemistry  which  have  pre¬ 
vented  its  routine  use  in  the  protein  chemistry  lab  have  been  addressed  over  the  last  few 
years.  The  use  of  sodium  or  potassium  trimethylsilanolate  for  the  cleavage  reaction  provided 
a  method  for  rapid  and  specific  hydrolysis  of  the  derivatized  C-terminal  amino  acid  which 
left  the  shortened  peptide  with  a  free  C-terminal  carboxylate  ready  for  continued  rounds  of 
sequencing  (Bailey  et  al.,  1992a).  The  use  of  diphenylphosphoroisothiocyanatidate  (DPP- 
ITC)  and  pyridine  combined  the  activation  and  derivatization  steps  and  permitted  the 
quantitative  conversion  of  19  of  the  twenty  common  amino  acids  (the  exception  being 
proline)  to  a  thiohydantoin  derivative.  These  improvements  permitted  application  of  the 
C-terminal  chemistry  to  a  wide  variety  of  protein  samples  with  cycle  times  similar  to  those 
employed  for  N-terminal  sequence  analysis  (Bailey  et  al,  1992b).  The  introduction  ofZitex 
(porous  Teflon)  as  a  support  for  protein  sequencing  permitted  the  C-terminal  sequence 
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analysis  of  protein  samples  which  were  non-covalently  applied  to  the  sequencing  support 
(Bailey  et  al.,  1992b;  Bailey  et  aL,  1993). 

The  inability  of  C-terminal  proline  to  be  derivatized  to  a  thiohydantoin  has  been  a 
major  impediment  to  the  development  of  a  routine  method  for  the  C-terminal  sequence 
analysis  of  proteins  and  peptides.  Since  the  method  was  first  described  in  1 926  (Schlack  and 
Kumpf,  1926),  the  derivatization  of  C-terminal  proline  has  been  problematic.  While  over 
the  years  a  few  investigators  have  reported  the  derivatization  of  proline,  either  with  the  free 
amino  acid  or  on  a  peptide,  to  a  thiohydantoin  (Kubo  et  al.,  1971 ;  Yamashita  and  Ishikawa, 
1971;  Inglis  et  al.,  1989),  others  have  been  unable  to  obtain  any  experimental  evidence  for 
the  formation  of  a  thiohydantoin  derivative  of  proline  (Turner  and  Schmerzler,  1954;  Fox  et 
al.,  1955;  Stark,  1968;  Bailey  and  Shively;  1990).  Recently,  utilizing  a  procedure  similar  to 
that  described  by  Kubo  et  al.  (1971),  Inglis  et  al.  (1993)  have  described  the  successful 
synthesis  of  thiohydantoin  proline  from  N-acetylproline.  This  was  done  by  the  one-pot 
reaction  of  acetic  anhydride,  acetic  acid,  trifluoroacetic  acid,  and  ammonium  thiocyanate 
with  N-acetyl  proline.  We  have  reproduced  this  synthesis  and  further  developed  it  to  a  large 
scale  synthesis  of  TH-Proline. 

We  also  describe  the  development  of  chemistry  based  on  the  DPP-ITC/pyridine 
reaction  which  permits  the  efficient  derivatization  and  hydrolysis  of  peptidyl  C-terminal 
proline  to  a  thiohydantoin  and  discuss  the  integration  of  this  chemistry  into  an  automated 
method  for  the  C-terminal  sequence  analysis  of  polypeptides  containing  C-terminal  proline. 


MATERIALS  AND  METHODS 

Materials.  Diphenyl  chlorophosphate,  acetic  anhydride,  trimethyl silylisothiocy- 
anate,  anhydrous  dimethylformamide  (DMF),  anhydrous  acetonitrile,  and  anhydrous  pyri¬ 
dine  were  from  Aldrich.  Water  was  purified  on  a  Millipore  Milli-Q  system.  Sodium 
trimethylsilanolate  was  obtained  from  Fluka.  Diphenyl  phosphoroisothiocyanatidate  was 
synthesized  as  described  (Kenner  et  al.,  1953).  All  of  the  peptides  used  in  this  study  were 
either  obtained  from  Bachem  or  Sigma.  N-Acetyl  proline  was  from  Sigma.  Diisopropylethy- 
lamine  (sequenal  grade),  trifluoroacetic  acid  (sequenal  grade),  and  1 ,3-dicyclohexylcarbodi- 
imide  (DCC)  were  obtained  from  Pierce.  The  carboxylic  acid  modified  polyethylene 
membranes  were  from  the  Pall  Corporation  (Long  Island,  NY).  Zitex  G- 1 1 0  was  from  Norton 
Performance  Plastics  (Wayne,  NJ).  The  amino  acid  thiohydantoins  used  in  this  study  were 
synthesized  as  described  (Bailey  and  Shively,  1 990).  The  Reliasil  HPLC  columns  used  in 
this  study  were  obtained  from  Column  Engineering  (Ontario,  CA). 

Synthesis  of  Thiohydantoin  Proline.  Acetic  anhydride  (100  ml),  acetic  acid  (20  ml), 
and  trifluoroacetic  acid  (10  ml)  were  added  to  N-acetylproline  (500  mg).  The  mixture  was 
stirred  until  dissolved.  Trimethylsilylisothiocyanate  (3  ml)  was  added  and  mixture  stirred  at 
60°C  for  90  min.  The  reaction  was  dried  to  a  powder  by  rotary  evaporation  and  water  (50 
ml)  added.  This  solution  was  again  dried  by  rotary  evaporation  and  water  (20  ml)  added.  A 
white  powder  formed.  The  solution  was  kept  on  ice  for  approximately  30  min.  The  powder 
(thiohydantoin  proline)  was  collected  by  vacuum  filtration.  The  yield  was  approximately 
40%.  The  product  was  characterized  by  UV,  FAB/MS,  and  NMR.  The  UV  absorption 
spectrum  had  a  of  271  nm  in  methanol.  FAB/MS  gave  the  expected  MYP  =157.  NMR 
5  4.32(Hc„  m),  3.85  (Hg,  m),  3.43  (Hg,  m),  2.20  (H^  and  %  m),  1.70  (Hp,  m). 

Covalent  Coupling  of  Peptides  to  Carboxylic  Acid  Modified  Polyethylene.  Peptides 
were  covalently  coupled  to  carboxylic  acid  modified  polyethylene  and  quantitated  as 
described  (Shenoy  et  al.,  1992). 


Automated  C-Terminal  Sequencing  of  Polypeptides 


133 


Application  of  Protein  Samples  to  Zitex.  The  Zitex  support  (2  x  10  mm)  was  pre-wet 
with  isopropanol  and  protein  samples  (2-5  pi)  dissolved  in  water  were  applied.  The  samples 
were  allowed  to  dry  before  sequencing. 

HPLC  Separation  of  the  Amino  Acid  Thiohydantoins.  Reverse  phase  HPLC  separa¬ 
tion  of  the  thiohydantoin  amino  acid  derivatives  was  performed  on  a  C- 18  (3p,  100  A) 
Reliasil  column  (2.0  mm  x  250  mm)  on  a  Beckman  126  Pump  Module  with  a  Shimadzu 
(SPD-6A)  detector.  The  column  was  eluted  for  2  min  with  solvent  A  (0.1%  trifluoroacetic 
acid  in  water)  and  then  followed  by  a  discontinuous  gradient  to  solvent  B  (10%  methanol, 
10%  water,  80%  acetonitrile)  at  a  flow  rate  of  0.15  ml/min  at  35°C.  The  gradient  used  was 
as  follows:  0%  B  for  2  min,  0-4%  B  over  3  min,  4-35%  B  over  35  min,  35-45%  B  for  5  min, 
and  45-0%  B  over  3  min.  Absorbance  was  monitored  at  265  nm. 

Automation  of  the  C-Terminal  Sequencing  Chemistry.  The  instrument  used  for 
automation  of  the  chemistry  described  in  this  manuscript  has  been  described  previously 
(Bailey  et  al.,  1993). 


RESULTS  AND  DISCUSSION 

Chemistry  for  Automated  C-Terminal  Sequence  Analysis  of  Proline 
Containing  Polypeptides 

Application  of  the  acetic  anhydride/TMS-ITC/TFA  procedure,  used  for  the  synthesis 
of  TH-proline,  to  the  tripeptide,  N-acetyl-Ala-Phe-Pro,  in  our  laboratory,  found  that  thiohy¬ 
dantoin  proline  was  formed  in  low  yield  (approx.  1-2%  of  theoretical).  Recovery  of  the 
peptide  products  after  the  reaction  revealed  that  approximately  half  of  the  starting  peptide 
was  unchanged  and  the  remaining  half  had  been  decarboxyl ated  at  the  C-terminus,  thereby 
blocking  it  to  C-terminal  sequence  analysis.  This  was  most  likely  caused  by  the  high 
concentration  of  trifluoroacetic  acid,  the  excess  of  acetic  anhydride  present,  and  the  high 
temperature  (80°C)  at  which  the  reaction  was  performed.  Substitution  of  TMS-ITC  in  place 
of  ammonium  thiocyanate  and  lowering  the  reaction  to  50°C  also  resulted  in  an  approxi¬ 
mately  2%  yield  of  TH-proline,  although  no  decarboxylated  peptide  was  formed. 

The  poor  reaction  with  C-terminal  proline  most  likely  stems  from  the  fact  that  proline 
cannot  form  the  necessary  oxazolinone  for  efficient  reaction  with  the  isothiocyanate. 
Previous  work  in  our  laboratory  has  obviated  the  need  for  oxazolinone  formation  by  the  use 
of  diphenyl  phosphoroisothiocyanatidate  and  pyridine.  Reaction  of  this  reagent  with  C-ter¬ 
minal  proline  directly  forms  the  acylisothiocyanate.  Once  the  acylisothiocyanate  is  formed, 
the  addition  of  either  liquid  or  gas  phase  acid  followed  by  water  was  found  to  release  proline 
as  a  thiohydantoin  amino  acid  derivative.  Unlike  thiohydantoin  formation  with  the  other  19 
naturally  occurring  amino  acids,  C-terminal  proline  thiohydantoin  requires  the  addition  of 
acid  to  provide  a  hydrogen  ion  for  protonation  of  the  thiohydantoin  ring  nitrogen.  This  step 
is  necessary  for  stabilization  of  the  proline  thiohydantoin  ring.  The  resulting  quaternary 
amine  containing  thiohydantoin  can  then  be  readily  hydrolyzed  to  a  shortened  peptide  and 
thiohydantoin  proline  by  introduction  of  water  vapor  or  by  the  addition  of  sodium  trimethyl- 
silanolate  (the  reagent  normally  used  for  cleavage  of  peptidylthiohydantoins).  The  automat¬ 
ion  of  this  chemistry  has  allowed  proline  to  be  analyzed  in  a  sequential  fashion  without 
affecting  the  chemical  degradation  of  the  other  amino  acids. 

The  chemical  scheme  for  C-terminal  sequencing  is  shown  in  Figure  1 .  The  first  step 
involves  treatment  of  the  peptide  or  protein  sample  with  diisopropylethylamine  in  order  to 
convert  the  C-terminal  carboxylic  acid  into  a  carboxylate  salt,  Derivatization  of  the  C -ter¬ 
minal  amino  acid  to  a  thiohydantoin  is  accomplished  with  diphenylphosphorylisothiocyana- 
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Figure  1,  Reaction  scheme  and  postulated  intennediates  for  the  sequential  C-terminal  degradation  of  polypep¬ 
tides  Which  may  contain  proline. 


tidate  (liquid  phase)  and  pyridine  (gas  phase).  The  peptide  is  then  extensively  washed  with 
ethyl  acetate  and  acetonitrile  to  remove  reaction  by-products.  The  peptide  is  then  treated 
briefly  with  gas  phase  trifluoroacetic  acid,  followed  by  water  vapor  in  case  the  C-terminal 
residue  is  a  proline  (this  treatment  has  no  effect  on  residues  which  are  not  proline).  The 
derivatized  amino  acid  is  then  specifically  cleaved  with  sodium  or  potassium  trimethylsi- 
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lanolate  to  generate  a  shortened  peptide  or  protein  which  is  ready  for  continued  sequencing. 
In  the  case  of  a  C-terminal  proline  which  was  already  removed  by  water  vapor,  the  silanolate 
treatment  merely  converts  the  C-terminal  carboxylic  acid  group  on  the  shortened  peptide  to 
a  carboxylate.  The  thiohydantoin  amino  acid  is  then  quantitated  and  identified  by  reverse- 
phase  HPLC, 

The  proposed  role  of  trifluoroacetic  acid  (TFA)  is  for  the  protonation  of  the 
thiohydantoin  proline  ring.  The  addition  of  water  or  silanolate  salt  is  required  for 
cleavage  of  the  TH-proline.  If  the  temperature  is  raised  to  80°C  and  the  TFA  step  is 
prolonged  (10  to  20  min)  the  acid  alone  can  be  used  to  cleave  the  TH-proline.  TFA 
and  water  under  the  conditions  used  for  the  proline  reaction  at  50°C  does  not  cleave 
peptidylthiohydantoins  formed  from  the  other  19  amino  acids.  This  makes  it  possible 
to  integrate  the  unique  steps  needed  for  proline  as  routine  steps  in  the  automated 
C-terminal  sequencing  program. 

Examples  of  Automated  Sequence  Analysis 

Automated  C-terminal  sequencing  was  performed  on  a  compact  protein  sequencer 
designed  and  built  at  the  City  of  Hope  (Bailey  et  al.,  1993).  The  total  program  run  time  for 
a  cycle  of  C-terminal  sequencing  is  approximately  60  min. 

The  performance  of  the  automated  method  was  evaluated  by  sequencing  a  number 
of  peptide  and  protein  samples.  Peptide  samples  for  C-terminal  sequencing  were  covalently 
attached  to  carboxylic  acid  polyethylene  (PE-COOH)  prior  to  sequence  analysis.  Proteins 
and  longer  polypeptides  (5  kdal  and  larger)  were  noncovalently  applied  to  Zitex  G-110 
(porous  Teflon).  Figure  2  shows  the  automated  C-terminal  sequence  analysis  of  the  tripep¬ 
tide,  AFP  (12  nmol).  The  yield  of  the  amino  acid  in  cycle  3  is  low  since  this  is  the  amino 
acid  which  is  covalently  attached  to  the  solid  support.  This  has  been  observed  for  all  peptides 
covalently  attached  to  PE-COOH  (Bailey  et  al.,  1992a).  Figure  3  shows  the  automated 
C-terminal  sequence  analysis  of  polyproline  (1  nmol)  (the  average  molecular  weight  of  the 
polyproline  used  was  12,000  daltons)  noncovalently  applied  to  Zitex.  The  reduced  yields  of 
proline  in  cycles  two  and  three  are  consistent  with  the  known  washout  of  samples  with 
molecular  weights  of  less  than  16,000  daltons.  Figure  4  shows  application  of  the  sequencing 
chemistry  to  ovalbumin  (approx.  5-6  nmol)  noncovalently  applied  to  Zitex.  The  expected 
sequence  at  the  C-terminus  is  — Val-Ser-Pro.  Although  there  is  considerable  cycle  to  cycle 
lag  in  this  example,  proline  is  clearly  sequenced.  Work  is  continuing  toward  optimizing  this 
automated  chemistry. 


SUMMARY 

We  have  described  a  simple  procedure  for  the  large  scale  (200  mg)  synthesis  of 
thiohydantoin  proline  from  N-acetyl  proline  and  extensively  characterized  this  analogue. 
The  thiohydantoin  derivative  of  proline  is  conveniently  obtained  as  a  white  powder  which 
is  stable  to  long  term  storage.  The  availability  of  a  thiohydantoin  proline  standard  is  critical 
for  the  evaluation  of  automated  sequencing  results. 

We  have  described  automated  chemistry  which  is  capable  of  the  C-terminal  sequence 
analysis  of  polypeptides  containing  C-terminal  proline.  This  chemistry  has  been  integrated 
into  the  automated  sequencing  program  previously  used  for  the  C-terminal  sequence  analysis 
of  the  other  1 9  amino  acids  without  affecting  performance. 

We  have  proposed  a  chemical  mechanism  for  proline  sequencing  via  the  thiohydan¬ 
toin  route  which  is  consistent  with  the  experiments  performed  to  date. 
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Figure  2.  Automated  C-terminal  sequencing  of  the 
tripeptide,  AFP  (12  nmol),  covalently  attached  to 
carboxylic  acid  modified  polyethylene.  Each  thio- 
hydantoin  derivative  is  identified  by  comparison  to 
the  retention  time  of  an  authentic  standard.  Unla¬ 
beled  peaks  are  background  produced  by  reaction 
side  products. 


Retention  Time  ( min ) 


Figure  3.  Automated  C-terminal  sequencing  of 
polyproline  (1  nmol)  non-covalently  applied  to 
Zitex.  The  average  molecular  weight  of  polypro¬ 
line  used  was  12,000  daltons. 
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Figure  4.  Automated  C-terminal  sequencing  of  ovalbumin  (ap¬ 
prox.  5-6  nmol)  non-covalently  applied  to  Zitex.  The  expected 
sequence  at  the  C-terminus  is  — ^Val-Ser-Pro. 


The  failure  of  previous  methods  to  derivatize  C-terminal  proline  maybe  due  to  the 
inability  of  proline  to  form  an  oxazolinone,  a  necessary  step  in  many  of  the  previous  methods. 
The  use  of  DPP-ITC/pyridine  for  derivatization  permits  the  direct  formation  of  an 
acylisothiocyanate  at  the  C-terminus  without  the  need  for  oxazolinone  formation. 

Once  an  acylisothiocyanate  is  formed  it  can  cyclize  to  a  quaternary  amine  containing 
thiohydantoin.  This  thiohydantoin,  if  protonated  with  acid,  is  stable.  If  the  acid  step  is 
eliminated  C-terminal  proline  is  regenerated.  The  quaternary  amine  containing  proline 
thiohydantoin  can  be  readily  cleaved  with  water  vapor  or  alternatively  with  the  silanolate 
salt  normally  used  for  the  cleavage  reaction. 
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Current  Expectations  for  C-Terminal  Sequencing 

Current  technology  now  permits  1-3  cycles  of  automated  C-terminal  sequence 
analysis  on  200  pmol  -  4  nmol  of  non-covalently  applied  protein  samples  which  contain  any 
of  the  twenty  common  amino  acids. 

Work  is  continuing  toward  the  goal  of  extending  the  number  of  cycles  of  sequence 
information  which  can  be  obtained  with  this  automated  method. 
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INTRODUCTION 

Mass  spectrometry  (MS)  is  now  widely  accepted  as  an  analytical  technique  of 
complementary  value  to  Edman-based  approaches  to  peptide  and  protein  structure  determi¬ 
nation.  The  value  of  MS  derives  from  the  accommodation  of  mixtures  and  the  possibilities 
for  characterization  of  modified  amino  acid  residues.  Tandem  MS  in  particular  is  important 
in  addressing  both  of  these  issues.  The  essential  features  of  tandem  MS  are  the  promotion 
of  ion  fragmentation  (generally  following  collision  with  a  target  gas)  and  the  establishment 
of  connectivity  between  precursor  and  product  ions.  Such  analyses  can  yield  structural 
information  for  individual  components  of  mixtures.  A  variety  of  instrumental  techniques 
have  been  used  for  tandem  MS,  differing  in  the  choice  of  ion  analyzers  and  the  precise 
experimental  conditions  under  which  precursor  ion  activation  and  decomposition  take  place. 
Thus,  for  example,  tandem  MS  of  peptides  using  four-sector  mass  spectrometers  generally 
involves  high  energy  collisional  activation  of  precursor  ions  (usually  [M+H]^)  to  promote 
fragmentations  indicative  of  sequence  and  permitting  the  differentiation  of  isomeric/isobaric 
amino  acid  residues  [1]. 

Tandem  MS  analyses  using  triple  quadrupole  or  hybrid  sector/quadrupole  instru¬ 
ments  usually  include  collisional  activation  at  low  energies  with  a  correspondingly  extended 
time  period  during  which  precursor  ion  decompositions  can  be  observed.  These  conditions 
may  promote  fragmentation  processes  which  differ  from  those  observed  under  conditions  of 
high  energy  collisional  activation;  aspects  of  this  ion  chemistry  remain  to  be  elucidated.  Our 
recent  work  has  included  studies  aimed  at  the  understanding  of  the  factors  determining  low 
energy  fragmentations  of  protonated  peptides.  This  work  has  suggested  that  extensive 
diagnostic  fragmentation  of  protonated  peptides  via  low  energy  pathways  is  promoted  by  a 
precursor  ion  population  which  is  heterogeneous  with  respect  to  the  site  of  charge  [2].  This 
is  consistent  with  the  general  concept  that  low  energy  decompositions  are  generally  charge- 
directed  and  is  in  accord  with  previous  observations  of  the  unfavorable  fragmentation 
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HNE  Modified  Cysteine 


OH 


HNE  Modified  Histidine 


HNE  Modified  Lysine 


Figure  1.  Structures  of  4-hydroxynonenal  and  putative  products  of  reaction  with  amino  acids. 


properties  of  peptides  which  incorporate  strongly  basic  residues  (such  as  arginine)  [3]. 
Definitive  evidence  conies  from  the  analysis  of  peptides  which  have  been  converted  to 
pre-charged  derivatives;  little  fragmentation  diagnostic  of  sequence  is  observed  [2]. 

Electrospray  ionization  has  proved  highly  compatible  with  tandem  mass  spectrome¬ 
try  As  anticipated  in  the  discussion  above  of  low  energy  fragmentations,  the  multiplicity  of 
protonation  sites  promotes  extensive  fragmentation.  Much  further  work  is  required,  however, 
to  improve  our  understanding  of  the  fragmentations  of  multiply  charged  ions  and  facilitate 
the  interpretation  of  product  ion  spectra  derived  from  unknowns.  Nevertheless,  impressive 
examples  have  been  reported  of  the  high  sensitivity  characterization  of  peptides  using 
tandem  MS  of  low  charge  states  [4]. 

In  the  present  report  we  describe  two  aspects  of  this  laboratory  s  recent  work  on  the 
tandem  MS  of  peptides.  The  first  area  concerns  model  studies  on  the  detection  and  charac¬ 
terization  of  peptides  modified  by  reaction  with  4-hydroxynonenal,  a  common  product  of 
lipid  oxidation.  Figure  1  shows  the  structure  of  4-hydroxynonenal  and  of  putative  products 
of  reaction  with  cysteine  and  histidine  (via  Michael  addition),  and  with  lysine  (involving 
formation  of  a  Schiff’s  base).  Secondly,  we  describe  further  studies  of  intra-ionic  acid/base 
interactions  in  gas-phase  peptide  ions. 


EXPERIMENTAL  METHODS 
Materials 

Bovine  insulin  B-chain  (in  which  cysteine  residues  were  oxidized  to  cysteic  acids) 
was  purchased  from  Sigma  and  used  as  supplied.  Angiotensin  III  (2-7)  was  prepared  from 
angiotensin  (Sigma)  via  tryptic  hydrolysis  followed  by  HPLC  purification.  Similarly  the 
peptide  SCFR  was  prepared  from  RLCIFSCFR  (synthesized  in  the  School  of  Biological 
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Sciences,  University  of  Manchester)  via  chymotryptic  digestion  followed  by  HPLC  purifi¬ 
cation.  Oxidation  of  cysteine  to  cysteic  acid  residues  was  carried  out  using  performic  acid 
as  described  by  Burlet  et  al.  [5]. 

4-hydroxynonenal  (HNE)  was  synthesized  by  a  modification  of  the  method  of 
Esterbauer  and  Weger  [6];  the  details  will  be  provided  elsewhere  (manuscript  in  preparation). 
Reaction  of  various  peptides  with  HNE  was  carried  out  as  follows.  The  peptide  standards 
were  dissolved  to  a  concentration  of  1  mg/ml  in  an  aqueous  O.IM  K2HPO4  buffer  solution 
which  had  been  adjusted  to  pH  7.4  by  the  addition  of  0. 1  M  KH2PO4.  A  ten-fold  molar  excess 
of  HNE  was  added  and  the  samples  were  vortexed  for  1  min  before  incubation  at  37°C  for 
6-24  h.  The  incubations  containing  angiotensin  III  (2-7)  or  SCFR  were  fractionated  by 
reverse  phase  HPLC,  whereas  the  insulin  B-chain  incubation  was  subjected  to  rapid  HPLC 
for  de-salting  purposes  only. 

Enzymatic  Digestions 

Removal  of  the  C-terminal  arginine  residue  of  RLCIFSCFR  was  achieved  by  car- 
boxypeptidase  B  hydrolysis  using  a  modification  of  the  procedure  described  by  Allen  [7]. 
The  enzyme  was  added  to  a  solution  of  the  peptide  in  O.IM  ammonium  acetate  (pH  8.5)  to 
give  an  enzyme: substrate  molar  ratio  of  approximately  1:100.  Hydrolysis  was  allowed  to 
proceed  for  approximatley  1  h,  with  monitoring  of  the  progress  of  the  reaction  by  mass 
spectrometry. 

HPLC  Separations.  HPLC  was  performed  using  a  Waters  600  HPLC  pump  and 
controller  with  a  Waters  490  variable  wavelength  UV  detector.  Separation  was  achieved 
using  a  Waters  Novapak  C18  column  (3.9  x  150  mm).  A  linear  gradient  was  formed  from 
100%  to  0%  A  in  50  min  at  1  ml/min.  Solvent  A  was  water  with  0.1%  trifluoroacetic  acid. 
The  UV  detector  was  set  to  monitor  2 1 7  nm  and  the  HPLC  fractions  were  collected  into  1.5 
ml  polypropylene  tubes  and  dried  under  reduced  pressure.  HPLC  desalting  was  performed 
using  a  Vydac  Cg  column  (2.1  x  150  mm)  with  a  5  min  isocratic  elution  at  100%  solvent  A 
followed  by  a  step  gradient  elution  to  20%  A.  The  flow  rate  was  0.5  ml/min  and  detection 
was  at  2 1 7  nm.  The  UV  peak  eluting  after  the  step  gradient  was  collected  and  dried  as  above. 

Mass  Spectrometric  Analyses 

FAB  MS  analyses  were  performed  using  a  VG  7070Q  instrument  with  the  configura¬ 
tion,  electric  sector  (E)/magnetic  sector  (B)/collision  quadrupole  (q)/analyzer  quadrupole  (Q). 
The  FAB  primary  beam  was  xenon  atoms  of  8  keV  energy.  The  liquid  matrix  was  a  1 : 1  mixture 
of  bis-(2-hydroxyethyl)-disulfide  and  thioglycerol.  For  tandem  MS  analyses,  precursor  ions 
were  selected  at  1000  resolution  using  EB  and  subjected  to  collisional  activation  in  q.  The 
pressure  of  argon  collision  gas  was  sufficient  to  decrease  the  transmission  of  the  precursor  ion 
by  ca.  80%.  Product  ions  were  recorded  by  scanning  of  Q,  with  resolution  set  to  achieve  peak 
widths  of  of  1-2  m/z  units.  Alternatively,  precursor  ion  scans  were  obtained  by  scanning  of  B 
with  Q  set  to  transmit  a  selected  product  ion.  Data  were  recorded  via  a  VG  1 1/250  data  system, 
with  acquisition  in  the  “multi-channel  analyzer”  mode;  5-15  scans  were  accumulated. 

Electrospray  MS  analyses  were  performed  (through  the  courtesy  of  Dr.  M.  Morris, 
Fisons  Instruments)  using  a  VG  Quattro  II  triple  quadrupole  instrument.  Analytes  were 
introduced  by  loop  injection  into  a  stream  of  acetonitrile:  water  (1:1)  containing  0.2%  formic 
acid.  Ionization  was  by  nebulizer-assisted  electrospray;  the  nebulizer  gas  was  nitrogen  and 
the  electrospray  needle  was  held  at  3.5  kV.  Precursor  ions  were  selected  at  a  low  resolution 
sufficient  to  pass  all  of  the  major  isotopic  variants  into  the  collision  hexapole.  Selected  ions 
were  subjected  to  collision  with  argon  at  a  pressure  of  5  x  10*^  mbar.  The  collision  energy 
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was  optimized  for  maximum  fragmentation  efficiency  in  each  analysis.  Product  ions  were 
scanned  at  unit  resolution. 

Electrospray  MS  analyses  of  HNE-modified  angiotensin  III  (2-7)  were  performed  on  a 
Sciex  API  III  triple  quadrupole  instrument.  The  analyte  was  dissolved  in  water/acetonitrile/ace- 
tic  acid  (49/49/2)  and  introduced  into  the  ion  source  via  constant  infusion  at  1 0  pl/min  using  a 
Harvard  Apparatus  syringe  driver.  The  nebulizer  gas  was  nitrogen  and  the  electrospray  needle 
was  held  at  4.5  kV.  Precursor  ions  were  selected  at  low  resolution  and  introduced  into  the 
collision  quadrupole  which  was  maintained  with  an  argon  collision  gas  thickness  of  6.4  x  1 0^"^ 
atoms/cm^.  Product  ions  were  scanned  with  resolution  set  to  achieve  peak  widths  of  3  m/z  units. 

Matrix-assisted  laser  desorption/ionization  (MALDI)  analyses  were  performed  (by 
courtesy  of  Fisons  Instruments)  on  a  VG  TofSpec  instrument  equipped  with  a  nitrogen  laser. 
Samples  were  prepared  in  aqueous  0.1%  trifluoroacetic  acid  at  a  concentration  of  ca.  10 
picomole/pl  per  component.  A  2  pi  aliquot  was  mixed  with  2  pi  of  a  freshly  prepared  aqueous 
solution  of  0.1%  trifluoroacetic  acid  saturated  with  2,5-dihydroxybenzoic  acid.  The  mass 
resolution  was  approximately  500  (FWHM). 


RESULTS  AND  DISCUSSION 

As  an  initial  assessment  of  the  reactivity  of  different  amino  acid  residues  with 
4-hydroxynonenal,  we  have  investigated  the  sites  of  modification  of  a  model  oligopeptide, 
the  B-chain  of  insulin  incorporating  oxidized  cysteine  residues  (ie.  cysteic  acids).  Figure  2 
compares  the  MALDI/TOF  MS  analyses  of  the  chymotryptic  digest  of  the  unmodified 
peptide  and  the  product  of  reaction  with  4-hydroxynonenaL  The  peptide  sequence  and  the 
expected  chymotryptic  cleavage  sites  are  indicated  in  Figure  3.  Reaction  with  HNE  resulted 
in  the  incorporation  of  one  or  two  HNE  moieties  in  fragment  1-16,  consistent  with  modifi¬ 
cation  of  the  histidine  residues.  None  of  the  peaks  corresponding  to  fragments  1 7-24, 1 7-25, 
25-30  or  26-30  was  shifted  in  mass,  indicating  that  Lys^^  and  other  residues  were  not 
modified  or  that  the  modifications  were  labile  under  the  conditions  of  analysis. 

Tandem  MS  provides  a  powerful  approach  to  confirmation  of  the  structure  of  lipid-modi¬ 
fied  peptides  and  proteins.  We  have  therefore  initiated  a  study  of  the  fragmentation  behavior  of 
HNE-modified  peptides.  Figure  4  shows  the  product  ion  spectrum  obtained  by  collisional 
activation  of  [M+2H]^'^  ions  formed  from  HNE-adducted  angiotensin  III  (2-7)  during  electrospray 
MS  analysis.  The  decomposition  is  highly  efficient  and  yields  a  variety  of  diagnostic  fragment 
ions  corresponding  to  the  products  of  one  or  two  cleavages  within  the  peptide  chain.  The  first 
category  includes  N-terminal  fragments  (a-  and  b-series)  and  C-terminal  fragments  (y-series), 
where  the  nomenclature  used  is  Biemann’s  modification  [8]  of  the  suggestion  of  Roepstorff  and 
Fohlman  [9] .  Product  ions  resulting  from  two  chain  cleavages  include  “internal”  fragments  (b^yn) 
and  immonium  ions  representing  single  amino  acid  residues,  Immonium  ions  corresponding  to 
the  tyrosine  and  modified  histidine  residues  are  particularly  prominent.  The  latter  appears  at  m/z 
266,  consistent  with  Michael  addition  of  HNE  to  the  imidazole  ring. 

The  same  modified  peptide  was  analyzed  by  FAB/tandem  MS  with  CAD  of  the  singly 
protonated  molecule,  MH"^.  The  product  ion  spectrum  (Figure  5)  shows  the  modified 
histidine  immonium  ion  (m/z  266)  as  the  most  prominent  product  ion.  The  apparent 
propensity  to  fragment  in  this  way  suggests  a  means  for  screening  for  the  presence  of 
HNE-modified  histidine-containing  peptides  in  mixtures  such  as  protein  digests.  Figure  6 
illustrates  this  principle  with  the  analysis  of  a  simple  binary  mixture  of  angiotensin  III  (2-7) 
and  the  HNE-adducted  analogue.  Tandem  MS  analysis  involved  scanning  of  the  first  mass 
analyzer  with  the  second  set  to  transmit  a  single  product  ion  species  (in  this  case  m/z  266). 
The  resulting  precursor  ion  spectrum  reveals  the  mixture  component  (and  its  ion  source- 
formed  fragments)  which  incorporates  a  modified  histidine  residue. 
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Figure  2.  (a)  MALDI/TOF  MS  analysis  of  the  chymotryptic  digest  of  oxidized  (cysteine  to  cysteic  acid)  insulin 
B-chain;  (b)  equivalent  analysis  of  the  oligopeptide  modified  by  reaction  with  4-hydroxynonenal. 


5  10  15  20  25 

FVNQHLC(ox)GSHLVEALyVC(ox)GERGFfyTPKA 


Figure  3.  The  sequence  of  the  oxidized  B-chain  of  bovine  insulin.  C(ox)  represents  cysteic  acid  residues.  The 
arrows  indicate  the  observed  points  of  chymotryptic  cleavage. 
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Figure  4.  Product  ion  spectrum  obtained  following  collisional  activation  of  [M+2H]“'^  ions  obtained  by 
electrospray  of  VYIH*PF  (where  H*  represents  the  modification  of  the  histidine  residue  by  reaction  with 
4-hydroxynonenal). 


The  analyses  reported  here  represent  the  initial  phase  in  the  development  of  a 
screening  and  characterization  strategy  for  the  study  of  lipid-modified  peptides  and  proteins. 
Tandem  MS  plays  a  central  role  in  this  strategy;  its  most  effective  use  will  be  ensured  by 
improved  understanding  of  the  relationship  between  peptide  sequence  and  the  propensities 
to  fragment  via  a  number  of  pathways.  We  have  therefore  pursued  investigations  of  the  low 
energy  fragmentations  of  protonated  peptides,  with  particular  reference  to  the  influence  on 
fragmentation  of  the  site  of  charge. 

Figure  7  shows  the  product  ion  spectrum  obtained  following  low  energy  CAD  of 
singly  protonated  RLCIFSCFR  (generated  by  FAB).  A  C-terminal  rearrangement  ion 


VYIH*PF 


Figure  5.  Product  ion  spectrum  obtained  following  collisional  activation  of  MH’^  ions  obtained  by  FAB  of 
VYIH*PF  (where  H*  represents  the  modification  of  the  histidine  residue  by  reaction  with  4-hydroxynonenal). 
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(M+H)+  HNE-Adducted 
Angiotensin  III  (2-7) 


Figure  6.  Spectrum  of  precursors  of  m/z  266  recorded  during  FAB  tandem  MS  of  a  mixture  of  angiotensin  III 
(2-7)  and  the  HNE-adducted  analogue.  Only  the  modified  peptide  is  detected,  with  signals  orresponding  to  the 
MH^  ion,  a  matrix  adduct  and  minor  fragment  ions  formed  in  the  ion  source. 


(M+Hf 


Figure  7.  Product  ion  spectrum  obtained  following  collisional  activation  of  ions  obtained  by  FAB  of 
RLCIFSCFR. 
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RLCIFSCFR 
MS/MS  of  (M+2H)"* 
40  eV  Collision  Energy 
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Figure  8,  Product  ion  spectrum  obtained  following  collisional  activation  of  [M+2H]^'^  ions  obtained  by 
electrospray  of  RLCIFSCFR. 


(representing  loss  of  the  C-terminal  residue  but  retention  of  one  carboxylic  oxygen  [10]) 
is  observed  as  a  highly  favored  fragmentation.  Lower  members  of  the  rearrangement  ion 
series  (bn+H+OH)  are  also  observed  but  these  and  all  other  fragment  ions  are  observed 
with  low  abundance.  Low  energy  CAD  of  the  doubly  protonated  analogue  (generated  by 
electrospray)  shows  a  low  overall  efficiency  of  fragmentation  (Figure  8;  note  the  mag¬ 
nification  factors),  though  a  large  number  of  diagnostic  fragment  ions  are  discernible. 
Interestingly,  there  is  no  evidence  for  the  occurrence  of  the  C-terminal  rearrangement 
process.  The  high  gas  phase  basicity  of  arginine  suggests  that  the  principal  structure 
represented  in  the  precursor  population  of  doubly  charged  ions  incorporates  the  two 
protons  on  the  guanidino  groups  of  the  arginine  residues,  resulting  in  little  charge-directed 
fragmentation  of  the  peptide  chain. 

This  hypothesis  was  evaluated  by  electrospray  tandem  MS  analysis  of  the  same 
peptide  following  oxidation  of  the  cysteine  residues  to  cysteic  acid.  Previous  studies 
[5]  suggested  that  intraionic  acid-base  interactions  between  cysteic  acid  and  arginine 
residues  reduce  the  propensity  for  charge  location  on  the  arginine  residues,  resulting 
in  increased  yields  of  diagnostic  ions  associated  with  cleavages  of  the  peptide  chain. 
Figure  9  shows  the  product  ion  spectrum  obtained  by  CAD  of  doubly  protonated 
RLC(S03H)IFSC(S03H)FR,  where  C(S03H)  represents  cysteic  acid.  In  marked  contrast 
to  the  equivalent  data  for  the  unoxidized  peptide,  a  high  decomposition  efficiency  is 
observed  with  the  production  of  multiple  C-terminal  and  N-terminal  fragment  ions. 
These  findings  are  consistent  with  a  precursor  ion  population  heterogeneous  with 
respect  to  the  site  of  charge,  a  situation  facilitated  by  the  putative  cysteic  acid/arginine 
interactions. 
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Figure  9.  Product  ion  spectrum  recorded  following  collisional  activation  of  [M+2H]^'^  ions  obtained  by 
electrospray  of  RLC(S03H)IFSC(S03H)FR,  where  C(S03H)  represents  cysteic  acid. 
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INTRODUCTION 

For  structural  studies  of  various  proteins,  a  combination  of  traditional  sequence 
analysis  and  mass  spectrometry  (MS)  has  been  effectively  used  in  our  laboratory.  Protein 
sequencing  methods  and  mass  spectrometry  are  often  plagued  with  the  same  problems 
in  that  samples  are  frequently  contaminated  with  other  materials.  Many  buffer  compo¬ 
nents,  salts,  and  solubilizing  detergents  can  interfere  or  prevent  the  successful  application 
of  both  methodologies.  Membranes,  made  of  materials  such  as  polyvinyldifluoride 
(PVDF),  have  been  invaluable  in  this  regard  for  protein  sequencing.  These  membranes 
have  allowed  for  direct  analysis  of  proteins  from  complex  mixtures  by  allowing  for 
separation  by  gel  electrophoresis  and  subsequent  electroblotting  that  immobilizes  the 
protein  and  removes  potentially  interfering  small  molecules  (Matsudaira,  1987).  PVDF 
membranes  are  also  stable  to  most  organic  solvents.  Mass  spectrometry  is  a  sensitive 
bioanalytical  method  (McCloskey,  1990),  but  it  is  often  difficult  for  the  method  to 
selectively  discriminate  against  most  species  found  in  a  sample,  in  search  of  the  few 
components  the  scientist  is  truly  interested.  MS  analysis  of  a  sample  containing  a  small 
amount  of  peptide  in  a  great  molar  excess  of  buffer  salts  typically  results  in  a  mass 
spectrum  mostly  composed  of  buffer  ions.  Common  desalting  or  chromatographic  methods 
are  often  necessary  prior  to  analysis  by  mass  spectrometry,  but  this  adds  an  additional 
step  of  complexity  and  increases  the  total  analysis  time. 

To  increase  the  throughput  of  our  bioanalytical  laboratory,  we  have  been  investigating 
simple,  rapid  methods  of  sample  preparation  for  sequence  analysis  and  mass  determination. 
This  report  presents  some  of  the  observations  we  have  made  towards  this  goal  and  their 
applications.  Electrospray  ionization  (ESI)  (Fenn,  Mann  et  aL,  1989;  Smith,  Loo  et  al.,  1990) 
and  matrix-assisted  laser  desorption/ionization  (MALDI)  (Karas,  Bahr  et  al.,  1989;  Hil- 
lenkamp,  Karas  et  al.,  1991)  have  advanced  the  applicability  of  mass  spectrometry  to  large 
biomolecule  analyses.  We  present  results  demonstrating  the  unique  characteristics  of  an  array 
detector  for  ESI  detection.  Its  ability  to  discriminate  against  ions  based  on  charge  allows  for 
direct  detection  of  proteins  to  the  low  attomole  level  (Loo  and  Pesch,  1994).  Detection  of 


Methods  in  Protein  Structure  Analysis,  Edited  by  M.  Z,  Atassi  and  E.  Appella 
Plenum  Press,  New  York,  1995 


151 


152 


S.  D.  Buckel  et  al. 


higher-charged  protein  molecules  in  the  presence  of  higher  concentration,  lower  molecular 
weight  contaminants  will  be  demonstrated. 

To  monitor  the  degree  of  truncation  by  carboxypeptidase  for  C-terminal  sequence 
determination  of  a  small  protein,  the  molecular  weights  from  the  mixture  of  digest  products 
were  determined  without  additional  sample  preparation  {L  e. ,  without  removal  of  extraneous 
buffer  materials)  by  ESI-MS  and  MALDI-MS.  The  number  of  disulfide  bonds  present  in 
small  tightly-bridged  peptides  can  be  determined  by  measuring  the  difference  in  mass  of  the 
oxidized  material  and  the  peptide  reduced  by  tris-(2-carboxyethyl)phosphine  without  addi¬ 
tional  derivatization  of  the  resulting  free  cysteines.  We  have  also  been  able  to  determine  the 
protein  molecular  weights  from  spots  from  two-dimensional  gels  that  had  been  blotted  onto 
PVDF  membranes,  stained  with  Coomassie  blue,  and  extracted  with  hexafluoroisopropanol, 
and  also  determined  the  sequence  of  peptides  via  chemical  digestion  of  the  material  not  used 
for  mass  determination. 


EXPERIMENTAL 
Mass  Spectrometry 

ESI-MS  analyses  were  performed  with  a  Finnigan  MAT  900Q  forward  geometry 
hybrid  mass  spectrometer  (Bremen,  Germany)  equipped  with  a  20  kV  conversion 
dynode/electron  multiplier  point  detector  and  a  focal  plane  array  detector  (position-and-time 
resolved  ion  counting,  or  PATRIC)  (Loo,  Ogorzalek  Loo  et  al.,  1993;  Loo  and  Pesch,  1994). 
For  operation  with  the  microchannel  plate  array  detector,  an  8%  m/z  range  of  the  m/z  centered 
on  the  detector  was  used.  The  voltage  across  the  front  and  back  of  the  microchannel  plates 
is  designated  as  Vmcp.  An  electrospray  ionization  interface  based  on  a  heated  glass  capillary 
inlet  was  used  (Fenn,  Mann  et  al.,  1989).  Sample  solutions  for  ESI-MS  analysis  were  infused 
into  the  ESI  source  at  a  flow  rate  of  0.5-1. 5  pl/min.  The  typical  solution  composition  was 
1:1  Me0H/H20  with  1-2.5%  acetic  acid  (v/v). 

MALDI  mass  spectra  were  acquired  with  a  PerSeptive  Biosystems-Vestec  (Houston, 
TX)  LaserTec  Research  time-of-flight  mass  spectrometer  operating  in  the  linear  mode. 
Samples  were  prepared  by  placing  1  pi  of  a  1-10  pmol  pl'^  solution  (0.1%  trifluoroacetic 
acid  in  H2O)  of  the  peptide  on  the  sample  target  and  adding  1  pi  of  a  solution  of  a-cyano- 
4-hydroxycinnamic  acid  (4-HCCA,  5  pg  pi"'  in  1:2  acetonitrile/0. 1%  trifluoroacetic  acid 
(aq.)). 

Extraction  from  Blots 

Spots  from  two  blots  of  two  dimensional  gels  were  cut  into  about  1  mm  x  1  mm 
pieces,  placed  in  a  microcentrifuge  tube  and  200  pi  of  hexafluoroisopropanol  (HFIP)  was 
added.  This  mixture  was  incubated  in  an  Eppendorf  Thermomixer  with  shaking  at  25°C  for 
30  minutes.  The  solution  was  removed  and  an  additional  200  pi  of  HFIP  was  added  and  the 
extraction  was  repeated.  The  second  extract  was  pooled  with  the  first  and  lyophylized.  The 
dried  sample  was  dissolved  in  50  pi  of  5%  acetic  acid. 

Materials 

Melittin,  substance  P,  gramicidin  S,  ACTH,  ATX  II  toxin,  and  carbonic  anhydrase 
were  purchased  from  Sigma  Chemical  Co.  (St.  Louis,  MO).  Bovine  pancreatic  trypsin 
inhibitor  (BPTI  or  aprotinin)  was  obtained  from  BiosPacific  (Emeryville,  CA).  Tris-(2-car- 
boxyethyl)phosphine  (TCEP)  was  purchased  from  Molecular  Probes  (Eugene,  OR). 
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RESULTS  AND  DISCUSSION 

ESI-MS  of  Salt-Containing  Samples 

The  ability  to  directly  mass  analyze  protein  materials  without  additional  sample 
cleanup  results  from  the  discriminating  nature  of  the  focal-plane  array  detector  (Loo  and 
Pesch,  1 994),  The  use  of  a  focal  plane  detector  on  a  magnetic  sector  mass  spectrometer  with 
ESI  has  allowed  low-to-sub  femtomole  detection  limits  for  large  proteins  (Cody,  Tamura  et 
al.,  1994;  Loo  and  Pesch,  1994).  Full  scan  spectra  (m/z  500-3000)  can  be  collected  from  less 
than  5  femtomoles  consumed  of  bovine  albumin  (66  kDa),  and  less  than  500  attomoles  from 
a  2  fmol/pL  solution  of  porcine  pepsin  (34  kDa)  (Loo  and  Pesch,  1994).  The  array  detector 
can  be  “tuned”  for  higher  charged,  low  level  species  in  a  complex  mixture  and/or  in  the 
presence  of  low  molecular  weight  material. 

The  presence  of  salts  and  buffers  can  often  be  disabling  for  ESI-MS  experiments. 
However,  the  advantages  of  low  molecular  weight  (low  charge)  discrimination  can  be 
realized  for  high  levels  of  interfering  buffers  and  other  additives  used  in  protein  chemistry. 
Compared  to  singly  charged  ions,  highly  charged  ions  generate  many  more  secondary 
electrons  upon  hitting  the  microchannel  plate  of  the  array  detector  due  to  the  very  high  kinetic 
energy  of  these  ions.  The  nature  of  the  PATRIC  array  detection  electronics  allows  only  signal 
levels  between  the  minimum  and  maximum  thresholds  to  be  counted.  The  array  detector  can 
discriminate  against  highly  charged  ions  by  changing  the  voltage  applied  to  the  channelplates 
(^Mcp)*  In  order  to  selectively  detect  only  highly  charged  ions,  Vmcp  is  reduced  to  decrease 
the  number  of  secondary  electrons  and  place  the  signal  within  the  “acceptable”  window. 

Triton  X-100,  a  nonionic  polyoxyethylene  detergent,  is  often  used  to  solubilize 
proteins.  ESI  mass  spectra  for  a  3.4  pmol  pL'^  solution  of  bovine  carbonic  anhydrase  (29 
kDa)  in  the  presence  of  0.02%  (w/v)  Triton  X-100  are  shown  in  Figure  1.  As  expected,  the 


Figure  1.  Electrospray  ionization 
mass  spectra  of  3.4  pmol  of 
bovine  carbonic  anhydrase  (29 
kDa)  with  0.02%  Triton  X-100  (re¬ 
duced)  in  2:1  MeOH:H20  and 
2.5%  acetic  acid  with  V^cp  at  (a) 
+750  V  and  (b)  +635  V.  Approxi¬ 
mately  7.9  pmol  of  protein  was 
consumed  during  acquisition  of 
the  spectrum  in  Figure  (b). 
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Figure  2.  ESI  mass  spectra  of  7.9 
pmol  pL'*  of  bovine  carbonic  anhy- 
drase  (29  kDa)  with  50  mM  Tris  in 
2:1  Me0H:H20  and  2.5%  acetic 
acid  with  Vmcp  at  (a)  +850  V  and  (b) 
+700  V. 


mass  spectrum  with  Vmcp  at  +750  V  shows  only  singly-charged  ions  for  the  Triton  X-100 
oligomers,  whereas  reducing  Vmcp  to  +635  V  allows  detection  of  the  carbonic  anhydrase 
multiply  charged  molecules.  Similarly,  a  spectrum  of  carbonic  anhydrase  in  the  presence  of 
50  mM  TRIS  can  be  obtained  at  the  lower  Vmcp  voltages  (Figure  2). 

C-Terminal  Sequence  Analysis  with  MS  Detection 

Determination  of  protein  sequence  from  the  C-terminus  can  be  directly  obtained  by 
combining  traditional  proteolytic  chemistry  and  mass  spectrometry.  The  products  from 
enzyme  reactions  can  be  monitored  at  various  stages  of  the  reaction  by  MS.  For  example,  a 
sample  of  Aga  IVA,  a  highly  bridged  (by  disulfide  bonds)  toxin  from  spider  venom,  was 
dissolved  in  50  mM  sodium  citrate  pH  4.0  with  carboxypeptidase  Y  and  allowed  to  incubated 
for  24  hours  at  28°C.  The  sample  was  then  submitted  for  mass  determination  by  ESI-MS.  A 
portion  of  the  original  solution  was  diluted  by  a  factor  of  5  with  a  solution  of  H20/methanol 
containing  2.5%  acetic  acid  and  directly  infused  into  the  ESI  source.  The  results  are  shown 
in  Figure  3.  The  mass  of  the  observed  peptide  corresponds  to  the  first  39  residues  of  the 
original  polypeptide  (i.e.,  9  residues  from  the  C-terminus  were  liberated  from  the  peptide  by 
carboxypeptidase  Y).  The  voltage  on  the  microchannel  plates  of  the  array  detector  was 
reduced  to  “selectively”  detect  the  multiply  charged  peptide  ions  and  discriminate  against 
the  much  more  abundant  ions  from  the  buffer  components. 

Similarly,  carboxypeptidase  Y  was  used  to  cleave  C-terminal  residues  from  epidermal 
growth  factor  (EOF  1-48).  The  sample,  in  50  mM  sodium  citrate  at  pH  4.0,  was  analyzed  by 
MALDI-MS  at  various  time  points  of  the  reaction.  The  mass  spectra  acquired  after  1  minute 
and  1  hour  are  shown  in  Figure  4.  Interference  from  citrate  buffer  salts  was  reduced  by 
diluting  the  sample  by  a  factor  of  10  prior  to  MS  analysis.  Protein  products  with  masses 
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Figure  3.  ESI  mass  spectnim  of  car- 
boxypeptidase  Y-digested  Aga  IVA 
in  the  presence  of  10  mM  sodium 
citrate  buffer. 
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5445,  5317,  and  5204  Da  can  be  easily  measured  from  the  MALDI-MS  data,  representing 
the  intact  protein  and  the  loss  of  Lys  and  Leu,  respectively,  from  the  C-terminus. 


ESI-MS  of  Proteins  Isolated  from  Two-Dimensional  Gels 

Samples  thought  to  be  Histones  H2a  and  H2b  were  submitted  as  streaks  on  blots  from 
two  dimensional  gels.  The  H2b  sample  yielded  the  expected  sequence,  but  the  H2a  sample 
was  N-terminally  blocked.  To  further  characterize  these  proteins,  the  blots  were  extracted 
with  hexafluoroisopropanol  and  analyzed  by  electrospray  ionization  mass  spectrometry.  ESI 


miz 

Figure  4.  MALDI  mass  spectra  of  carboxypeptidase  Y-digested  EGF  1-48  after  (a)  1  min.  and  (b)  60  min. 
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Figure  5.  ESI  mass  spectra  of  (a) 
Histone  H2A  and  (b)  Histone  H2B 
isolated  from  2-D  gels  and  extracted 
from  blots  in  the  presence  of  excess 
Coomassie  blue  stain. 


mass  spectra  were  acquired  by  using  the  PATRIC  array  detector  at  low  microchannel  plate 
voltages  to  discriminate  against  low  molecular  weight  background  ions  from  the  excess 
Coomassie  blue  stain.  Measured  molecular  weights  corresponded  to  the  expected  14  kDa 
proteins  with  multiple  acetylation  and  multiple  amino  acid  substitutions.  (Figure  5)  The 
material  not  consumed  in  the  ESI-MS  experiment  was  digested  by  the  addition  of  cyanogen 
bromide  and  two  sequences  obtained  corresponded  to  the  sequence  of  the  amino  terminus 
of  H2a  and  the  sequence  after  Met51  in  the  sequence. 

ESI-MS  of  Disulfide-Containing  Proteins 

The  number  of  disulfide  bonds  in  a  protein  can  be  rapidly  determined  by  ESI  mass 
spectrometry.  As  has  been  shown  by  a  number  of  groups  (Feng,  Bell  et  al.,  1990;  Loo, 
Edmonds  et  al.,  1990),  the  measured  mass  difference  between  the  oxidized  form  and  the 
disulfide-reduced  form  allows  one  to  determine  the  number  of  disulfides.  For  example,  a 
measured  mass  difference  of  8  daltons  corresponds  to  4  disulfide  bonds  (2  daltons  for  each 
disulfide  bond).  The  use  of  cysteine  blocking  agents  (e.g.,  iodoacetamide  or  4  vinylpyridine) 
would  certainly  lessen  the  mass  accuracy  requirements  of  the  experiment.  However,  we 
wished  to  test  the  accuracy  of  the  method  with  as  few  complications  as  possible. 

Table  1  lists  the  ESI-MS  results  for  a  number  of  disulfide-containing  peptides  before 
and  after  reduction.  These  samples  were  dissolved  in  70%  acetonitrile,  0. 1  %  trifluoroacetic 
acid  and  the  oxidized  form  of  the  peptide  was  analyzed  without  further  workup.  To  reduce 
the  peptide,  tris-(2-carboxyethyl)phosphine  (TCEP)  was  added  to  a  concentration  of  20  mM 
and  heated  to  60°C  for  2  to  4  minutes.  The  sample  was  analyzed  without  further  work  up. 
ESI-MS  mass  measurements  were  obtained  at  a  resolving  power  of  greater  than  5000. 
Peptides,  such  as  melittin  (2845  Da),  substance  P  (1347  Da),  or  gramicidin  S  (1 141  Da)  were 
added  to  the  solution  to  provide  reference  peaks  (internal  standards)  for  more  accurate  mass 
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Table  1.  Application  of  high  resolution  ESI-MS  for  disulfide  determination 


Peptide 

MW 

(Sequence) 

MW 

(Measured) 

A  MW 
(meas.-seq.) 

MW 

(S-S  red.) 

A  MW 
(red.  -  ox.) 

Conotoxin 

2800.10 

2800.09 

-5.4  ppm 

2806.16 

6.07 

Aprl5C 

? 

3674.56 

— 

3680.62 

6.07 

Aprl5A 

9 

4389.27 

— 

4395.35 

6.08 

ATX  II 

4945.26 

4945.25 

-2.2  ppm 

4951.31 

6.06 

BPTI 

6507.04 

6507.03 

-2.1  ppm 

6513.06 

6.03 

Note;  Molecular  weight  values  are  monoisotopic  mass.  The  theoretical  MW  difference 
of  the  disulfide  reduced  form  versus  the  oxidized  species  containing  3  disulfides  is 
6.05  Da. 


determination.  Data  was  acquired  by  scanning  the  accelerating  voltage  at  a  constant  magnetic 
field  strength  in  the  mass-to-charge  region  around  the  multiply-charged  ion  (and  reference 
peak)  of  interest  (Cody,  Tamura  et  al,  1992). 

Figure  6  shows  the  ESI  mass  spectrum  of  anemonia  sulcata  toxin  II,  Ile-isotoxin  (ATX 
II),  a  47  residue  polypeptide.  Melittin  and  ACTH  (18-39,  2466  Da)  were  added  to  the  ATX 
Il-containing  solution  as  internal  calibrants  for  more  accurate  mass  determination.  A  portion 
of  the  high  resolution  mass  spectrum  is  shown  in  Figure  7.  In  general,  mass  accuracy  by  this 
high  resolution  method  is  around  5-10  parts-per-million  (ppm).  This  accuracy  is  certainly 
sufficient  to  determine  a  mass  difference  of  2  daltons  (1  disulfide  bond).  The  high  resolution 
capabilities  of  a  magnetic  sector  mass  spectrometer  is  useful  for  detecting  the  small  amount 
of  unreduced  protein  observed  in  the  spectrum  in  Figure  8. 


CONCLUSION 

Protein  analytical  chemistry  has  progressed  rapidly  in  the  last  several  years.  The 
development  of  electrospray  ionization  and  matrix-assisted  laser  desorption/ionization  has 
opened  numerous  research  opportunities  in  biochemistry.  Measuring  the  molecular  weight 
of  a  50  kDa  protein  to  an  accuracy  of  better  than  0.05%  and  detecting  the  presence  of  the 
protein  at  the  sub-picomole  level  by  mass  spectrometry  was  unimaginable  10  years  ago.  It 
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Figure  6.  Electrospray  ionization  mass  spectrum  of  ATX  II  toxin. 
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Figure  7.  Portion  of  the  high  resolution  (resolving  power  >  5000)  mass  spectrum  of  the  ATX  Il/melittin 
mixture. 


is  becoming  more  frequent  for  a  protein  chemist  to  isolate  picomole  amounts  of  a  protein 
and  either  directly  obtain  a  mass  spectrum  or  walk  down  the  hallway  and  hand  the  sample 
off  to  the  “mass  spectrometrist”  for  mass  analysis.  With  the  development  of  improved 
instrumentation  and  methodologies  based  on  ESI  and  MALDI  in  the  near  future,  this 
interaction  between  the  biochemist  and  the  analytical  chemist  will  be  a  common  practice. 


Figure  8.  Mass  spectrum  of  disulfide-reduced  ATX  II,  showing  the  presence  of  the  oxidized  protein. 
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INTRODUCTION 

For  more  than  25  years  protein  identification  has  largely  depended  on  automated  Edman 
chemistry  (Hewick  et  al.,  1981)  or  western  blotting  with  an  appropriate  monoclonal  antibody. 
Several  limitations,  however,  have  never  been  overcome.  The  Edman  procedure  is  inherently 
slow  (generally  one  or  two  peptide  or  protein  samples  per  day)  and  does  not  allow  direct 
identification  of  many  post-translational  modifications.  In  addition,  current  detection  limits  are 
in  the  low-picomole  to  upper-femtomole  range  (Totty  et  al.,  1992).  Protein  identification  by 
western  blotting  can  be  extremely  rapid,  but  requires  the  ready  availability  of  an  extensive 
library  of  suitable  antibody  probes.  Large-format  2D-electrophoresis  systems  now  make  it 
possible  to  resolve  several  thousand  cellular  proteins  from  whole-cell  lysates  in  the  low-  to 
upper-femtomole  concentration  range  (Patton  et  al.,  1990),  presenting  significant  analytical 
challenges.  The  recent  introduction  of  matrix-assisted  laser-desorption  (MALD)  time-of-flight 
mass  spectrometers  (Karas  and  Hillenkamp,  1988)  has  led  to  the  rapid  analysis  (at  high 
sensitivity)  of  peptide  mixtures.  New  strategies  have  been  developed  using  a  combination  of 
protease  digestion,  MALD  mass  spectrometry  and  searching  of  peptide-mass  databases  that 
promise  rapid  acceleration  in  the  identification  of  proteins  (Henzel  et  al.,  1993;  Pappin  et  al., 
1993;  Mann  et  al.,  1993;  James  et  al.,  1993;  Yates  et  al.,  1993). 

Microsequence  analysis  of  proteins  electroblotted  onto  PVDF  membranes  following 
purification  by  SDS  PAGE  has  become  an  essential  tool  for  the  protein  chemist,  and  several 
procedures  have  been  described  for  enzymatic  cleavage  of  proteins  bound  to  nitrocellulose 
or  PVDF  transfer  membranes  (Aebersold  et  al.,  1987;  Bauw  et  al.,  1989;  Fernandez  et  al., 
1992).  All  these  procedures  require  pre-treatment  of  membranes  with  polymers  such  as 
P VP-40  to  prevent  adsorption  and  denaturation  of  the  proteolytic  enzyme.  This  approach 
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considerably  increases  both  the  level  of  background  contamination  during  subsequent 
reverse-phase  purification  of  peptides  and  the  time  required  per  sample  for  extensive 
washing  and  extraction  of  residual  polymer.  A  significant  improvement  was  made  in  the  later 
work  of  Fernandez  et  al.  (1994)  where  the  use  of  hydrogenated  Triton  X-100  made  blocking 
with  PVP-40  redundant.  One  drawback  to  the  use  of  heteropolymeric  detergents,  such  as 
Triton,  is  that  residual  detergent  interferes  significantly  when  analysing  peptides  by  MALD 
mass  spectrometry.  We  report  here  on  the  development  of  simplified  digestion  methods  using 
octyl  glucoside  that  allow  for  the  rapid,  single  step  digestion  of  electroblotted  proteins  in  a 
form  suitable  for  both  analysis  by  MALD  mass  spectroscopy  or  conventional  Edman 
microsequencing.  We  also  report  here  on  the  application  of  the  procedure  to  the  analysis  of 
proteins  resolved  by  large-format  2D  electrophoresis  of  cellular  proteins. 


MATERIALS  AND  METHODS 

2D  SDS  Polyacrylamide  Gel  Electrophoresis  (PAGE) 

Human  Myocardial  Proteins.  Samples  of  human  ventricular  myocardium  were 
taken  from  explanted  hearts  at  the  time  of  cardiac  transplantation  and  frozen  in  liquid 
nitrogen.  Frozen  tissue  specimens  were  then  crushed  between  two  cooled  metal  blocks.  The 
resulting  powder  was  homogenised  in  1%  w/v  SDS,  spun  at  10,000  g  for  5  min,  rehomo¬ 
genised  and  recentrifuged  before  harvesting  the  supernatant.  Protein  concentration  was 
determined  by  the  Bradford  dye-binding  assay  and  the  samples  stored  frozen  at  -80°C. 

Preparative  2D  PAGE  was  performed  using  the  Millipore  Investigator  system  essen¬ 
tially  as  described  by  Patton  et  al.  (1990).  First-dimensional  isoelectric  focusing  (lEF)  was 
carried  out  in  preparative  rod  gels  (210  mm  x  3  mm)  containing  2.6%  w/v  acrylamide,  9M  urea, 
4%w/v  CHAPS,  1%  w/v  DTT,  2%  v/v  Resolyte  pH4-8  and  0.05%  w/v  Bromophenol  blue. 
Samples  containing  up  to  1  mg  of  total  protein  were  applied  to  each  gel.  Gels  were  focused  at 
800V  for  35,000  volt-hours  (Vh),  extruded  onto  parafilm  strips  and  stored  at  -80°C.  Second 
dimensional  electrophoresis  was  performed  on  230  x  200  x  1 .5  mm  SDS-PAGE  gels  (12%  w/v 
acrylamide)  with  2  cm  stacking  gels  overnight  at  3000  mW/gel  (cooled  to  10°C), 

Following  electrophoresis,  the  2D  gels  were  equilibrated  for  30  min  in  50  mM 
Tris/boric  acid  buffer,  pH  8.5  (Baker  et  al.,  1991).  Proteins  were  then  electroblotted  onto 
FluoroTrans  membranes  (Pall  Corp.)  for  6  hr  at  500  mA  (10°C)  using  the  ISO-DALT  system 
(Hoeffer)  and  proteins  visualised  by  staining  with  Coomassie  brilliant  blue  or  sulforho- 
damine  B  as  described  by  Coull  and  Pappin  (1990). 

Mouse  Brain  Proteins.  Brain  tissue  was  prepared  from  parental,  FI  and  back-cross 
progeny  by  rinsing  in  physiological  NaCl  solution,  freed  from  any  blood  vessels,  then  wiped 
dry  and  frozen  in  liquid  nitrogen.  Each  frozen  brain  was  pulverised  in  the  presence  of 
protease  inhibitors  in  a  minimum  volume  (v)  of  buffer  (v  pl=brain  weight  in  mg  x  0.03)  and 
the  ground  tissue  centrifuged  at  145,000  g  for  30  min.  Pulverised  material  was  then  prepared 
and  analysed  by  2D  gel  electrophoresis  essentially  according  to  the  method  of  Klose  (1975), 
electroblotted  onto  Immobilon-P  PVDF  membrane  (Millipore)  and  stained  with  sulforho- 
damine  B  as  above. 

ID  SDS  PAGE 

Proteins  were  subjected  to  SDS  polyacrylamide  gel  electrophoresis  essentially  ac¬ 
cording  to  Laemmli  (1970).  Gels  were  allowed  to  polymerise  overnight  to  fully  quench 
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polymer  free-radical s.  Gel  and  running  buffers  contained  only  0.02%  w/v  SDS,  with  0.02% 
v/v  thioglycolic  acid  added  to  the  running  buffer  as  a  scavenger.  Proteins  were  electroblotted 
onto  Immobilon-P  (PVDF)  transfer  membranes  using  the  MilliBlot-SDE  semi-dry  electrob- 
lotting  system  (Millipore)  and  the  low-ionic  discontinuous  6-aminohexanoic  acid  buffer 
system.  Transfer  was  typically  accomplished  by  electroblotting  at  1-1.5  mA/cm^  of  gel 
surface  area  for  45-60  minutes  at  constant  current.  The  PVDF  transfer  membrane  was  then 
washed  with  deionised  water  to  remove  buffer  salts  (two  changes  of 200  ml,  1 0  minutes  each 
with  mild  agitation),  blotted  dry  with  Whatman  3 MM  filter  paper  and  thoroughly  dried  in 
vacuo  for  at  least  20  minutes.  Blotted  proteins  were  visualised  by  staining  with  sulforho- 
damine  as  described  above. 

Digestion  Procedure  and  Mass  Spectrometric  (MS)  Analysis 

Dried,  stained  spots  (typically  a  few  square  mm  in  area)  were  placed  in  0.5  ml 
eppendorf  tubes  and  wet  with  2-4  pi  of  50  mM  ammonium  bicarbonate  solution  containing 
1%  w/v  octyl  glucoside  and  40  ng  trypsin/pl  (Promega,  modified).  Enzyme  solutions  should 
be  prepared  immediately  prior  to  use  to  minimise  autodigestion.  If  using  the  sulforhodamine 
stain,  it  was  not  usually  necessary  to  pre-wet  the  membrane  pieces,  destain,  or  block  with 
polymers  such  as  P VP-40.  Digestion  was  performed  overnight  at  30°C.  The  following  day, 
10-20  pi  of  formic  acid:ethanol  (1:1  v/v,  freshly  prepared)  was  added  to  each  sample,  and 
the  solution  allowed  to  stand  for  30-60  minutes  to  allow  peptides  to  diffuse  from  the  surface. 
Small  aliquots  (typically  0.5  pi)  were  sampled  directly  from  the  supernatant,  applied  to 
sample  slides  or  strips,  and  dried  under  high  vacuum  for  at  least  30  minutes  to  remove 
residual  ammonium  salts.  Each  dry  sample  was  then  re-wet  with  0.5  pi  matrix  solution  (1% 
w/v  alpha  cyano-4-hydroxycinnamic  acid  in  50%  aq.  acetonitrile  containing  0.1%  TFA  and 
200  femtomoles/pl  oxidised  insulin  B  chain),  allowed  to  air  dry  then  analysed  by  MALD 
time-of-flight  mass  spectrometry  using  a  Finnigan  MAT  LaserMat  2000  mass  spectrometer 
(Mock  et  al.,  1992  ).  Spectra  were  calibrated  using  the  insulin  B  chain  as  an  internal  standard. 
Observed  proteolytic  fragment  masses  were  screened  against  the  MOWSE  peptide-mass 
database  as  described  by  Pappin  et  al.  (1993). 

The  main  practical  difficulty  encountered  with  the  laser  instruments  is  in  finding  the 
optimum  concentration  of  peptides  (particularly  to  overcome  quenching).  Typically,  this  was 
achieved  by  dilution  of  the  digest  supernatant.  Thus,  having  sampled  an  initial  0.5  pi  from 
the  digest  as  above,  additional  10  pi  aliquots  of  the  formic: ethanol  mix  were  added,  allowed 
to  stand  30  min,  and  0.5  pi  sampled  again  for  MS  analysis  as  above.  This  was  repeated  as 
many  times  as  necessary  to  obtain  the  optimum  spectrum. 

Purification  of  Peptides  by  Reverse-Phase  HPLC 

In  cases  where  MS  fingerprint  analysis  failed  to  identify  the  protein,  the  remaining 
volume  of  digest  supernatant  was  collected,  dried  in  vacuo,  and  re-dissolved  in  10-30  pi 
0.5%  v/v  heptafluorobutyric  acid  (HFBA)  in  water.  The  samples  were  then  injected  onto 
narrow-bore  C8  or  Cl 8  reverse-phase  columns  (e.g.  Aquapore  RP-300,  2.1  mm  x  10  cm) 
equilibrated  with  0.025%  w/v  HFBA.  Peptides  were  eluted  with  linear  gradients  (2-80%)  of 
acetonitrile  containing  0.05%  v/v  TFA  at  0.1 -0.2  ml/min  over  60-80  minutes,  monitored  at 
220  nm. 

Peptide  Microsequencing 

HPLC  purified  peptides  were  collected,  dried  on  to  8  mm  arylamine-substituted 
PVDF  membranes  (Millipore)  and  covalently  immobilised  via  carboxyl  groups  as  described 
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by  Coull  et  al.  (1991).  Solid-phase  microsequence  analysis  was  performed  on  a  MilliGen 
6600  sequencer  as  described  by  Pappin  et  al.  (1990). 


RESULTS  AND  DISCUSSION 
Digestion  Procedure 

The  digestion  procedure  reported  here  was  developed  for  the  rapid  screening  of 
proteins  by  peptide-mass  fingerprinting,  primarily  derived  from  large-format  2D  gel  elec¬ 
trophoresis  of  whole  cell  lysates.  A  brief  comparison  of  the  process  with  more  typical 
on-membrane  digestion  procedures  is  shown  in  Figure  1 .  Key  differences  are  the  use  of  octyl 
glucoside,  very  low  digestion  volumes  (typically  only  a  few  pi)  and  the  use  of  formic  acid 
and  ethanol  solvents  to  achieve  efficient  elution  of  digested  peptides.  The  minimal  time  and 
manipulation  required  per  sample  allows  the  simultaneous  processing  of  many  dozens  of 
samples  in  parallel. 

Proteins  are  transferred  to  PVDF  membranes,  stained,  and  digested  in  the  presence 
of  1%  w/v  octyl  glucoside.  Using  this  treatment,  it  is  not  necessary  to  pre-wet  the  membrane 
pieces  or  block  with  polymers  such  as  PVP-40.  If  proteins  are  stained  with  sulforhodamine 
B  or  Ponceau  S  there  is  also  no  requirement  to  destain.  The  ability  to  use  underivatised  PVDF 
membranes  and  conventional  stains  side-steps  more  complicated  staining  and  digestion 
procedures  required  when  using  positively-charged  (cationic)  blotting  membranes  such  as 
Immobilon  CD  (Patterson  et  al.,  1992).  The  low  digestion  volumes,  typically  just  sufficient 
to  wet  the  membrane  pieces,  significantly  improve  digestion  kinetics  for  a  given  concentra¬ 
tion  of  enzyme.  For  reasons  that  are  not  yet  clear,  we  have  found  that  the  use  of  lower 
digestion  temperatures  (room  temperature  up  to  30°C)  give  more  complete  digestion  than 
identical  experiments  performed  at  37°C. 

Following  overnight  digestion,  peptides  are  passively  eluted  from  the  membrane 
surface  with  formic  acid  and  ethanol  (1:1  v/v).  Small  aliquots  can  be  sampled  directly  from 
the  digest  supernatant  and  analysed  by  matrix-assisted  laser-desorption  (MALD)  mass 
spectrometry.  In  contrast  to  the  reported  use  of  heteropolymeric  detergents  (e.g.  hydrogen¬ 
ated  Triton  X-100,  Nonidet,  Emulphogene)  the  presence  of  up  to  1%  w/v  octyl  glucoside 
has  no  apparent  effect  on  the  ionisation  efficiency  of  digested  peptide  mixtures.  In  addition, 
the  low  molecular  weight  of  this  detergent  does  not  interfere  with  peptide  mass  determination 
in  the  low-mass  range  (700-3000  Da).  The  described  procedure  thus  has  clear  advantages 
over  the  use  of  hydrogenated  Triton  X-100  as  reported  by  Fernandez  et  al.  (1994).  Residual 
buffer  salts  and  stains  such  as  sulforhodamine  B  or  Ponceau  S  also  have  minimal  suppression 
on  the  laser-desorption  process.  This  is  in  very  sharp  contrast  to  the  use  of  Coomassie  blue 
which  has  a  very  significant  suppressive  effect. 

The  digest  procedure  was  tested  over  a  period  of  several  months  on  proteins 
recovered  following  ID  and  2D  gel  electrophoresis  and  identified  by  peptide-mass  finger¬ 
printing  using  the  MOWSE  database  (Pappin  et  al.,  1993).  A  selection  of  proteins  success¬ 
fully  identified  using  this  process  is  shown  in  Table  1.  Identities  of  all  the  samples  shown  in 
Table  I  were  confirmed  by  microsequencing,  western  blotting  with  an  appropriate  antibody 
or  (in  two  cases)  by  genetic  mapping  to  a  precise  genetic  locus.  Samples  included  human 
myocardial  proteins,  recombinant  proteins,  viral  capsid  and  regulatory  proteins  and  proteins 
resolved  by  2D  gel  electrophoresis  of  whole  mouse  brain.  Sample  molecular  weights  ranged 
from  14.5  kDa  to  over  376  kDa. 

The  efficiency  of  digestion  and  recovery  of  peptides  was  measured  by  several 
parameters,  summarised  in  Table  IT  The  data  was  derived  entirely  from  those  proteins 


correctly  identified  by  peptide-mass  fingerprinting  and  whose  identities  were  confirmed  by 
sequence  analysis,  western  blotting  or  genetic  mapping. 

In  42  proteins  analysed  by  peptide  fingerprinting,  a  total  of  281  peptides  were 
recorded  where  observed  peptide  masses  could  be  matched  with  expected  peptide  sequences. 
Detailed  analysis  of  the  mass  spectra  recorded  for  several  proteins  showed  that  between  60 
and  80%  of  all  expected  peptides  between  700  and  3000  Da  were  present  in  the  digest 
supernatant  (passively  eluted  from  the  membranes  by  the  formic  acid: ethanol  solution).  In 
a  few  cases,  such  as  the  161  kDa  nitric  oxide  synthetase  (Table  I),  almost  90%  of  all  possible 
peptides  in  this  size  range  were  present  in  the  mass  spectrum.  Matched  peptide  masses  were 
further  sorted  into  ‘perfect’  or  ‘partial’  cleavage  products.  For  trypsin,  the  MOWSE  database 


Table  I.  Protein  samples  identified  by  peptide-mass  fingerprinting  after  enzymatic  digestion  using  the  PVDF/octyl  glucoside  procedure 
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Table  II.  Assessment  of  digestion  efficiency  and  recovery  of 
peptides  by  passive  elution  with  formic  acid:ethanol 


Trypsin 

Solution  digest 

PVDF/octyl  glucoside 

Total  protein  samples 

23 

42 

Total  matched  peptides 

168 

281 

%  Observed  (700-3000  Da) 

65-85 

60-80 

%  Perfect  cleavage 

85.1 

85.8 

%  Partial  cleavage  (nnp'*') 

14.9 

14.2 

^nnp:  Nearest-neighbour  pair 


classifies  a  ‘perfect’  cleavage  where  cleavage  has  occurred  directly  C-terminal  to  a  lysine  or 
arginine  residue  except  where  the  adjacent  residue  is  proline  (Pappin  et  al.,  1993).  The 
database  also  matches  all  cases  of  nearest-neighbour  partial  cleavages  where  cleavage  has 
failed  to  occur.  Examination  of  the  281  matched  peptide  masses  showed  that  the  large 
majority  (85.8%)  were  derived  from  ‘perfect’  enzyme  cleavages  with  only  14.2%  resulting 
from  nearest-neighbour  partial  cleavages  -  a  ratio  of  almost  6:1  in  favour  of  complete 
cleavage.  By  comparison,  Table  II  also  includes  the  same  data  obtained  from  23  different 
protein  samples  (reported  in  the  earlier  study  of  Pappin  et  al.,  1993)  where  the  samples  had 
been  digested  in  solution.  There  is  almost  exact  correspondence  in  all  categories,  including 
the  percentages  of  possible  peptides  observed  in  the  mass  spectra  and  ratio  of  ‘perfect’  to 
‘partial’  fragments.  Using  these  criteria,  on-membrane  PVDF/octyl  glucoside  digestion  and 
recovery  of  peptides  by  passive  elution  using  formic  acid  and  ethanol  gives  results  that  are 
indistinguishable  from  those  obtained  by  digestion  in  solution.  The  validity  of  this  result  is 
enhanced  by  the  fact  that  the  data  were  obtained  on  large  numbers  of  a  wide  variety  of 
proteins  derived  from  very  different  biological  sources. 

Influence  of  Adjacent  Residues  on  Proteolytic  Cleavage 

Analysis  of  all  observed  nearest-neighbour  pair  (nnp)  partial-cleavage  fragments 
revealed  significant  information  relating  to  the  influence  of  residues  C-terminal  to  the 
potential  cleavage  site  (Figure  2). 

In  the  case  of  trypsin,  almost  50%  of  observed  partial  cleavages  occurred  where  the 
enzyme  was  presented  with  adjacent  lysine  or  arginine  residues  (RR,  RK,  KK,  and  KR  pairs 
or  longer  repeats).  This  can  result  in  peptides  with  pairs  of  basic  residues  at  the  C-terminus 
or  the  presence  of  single  uncleaved  lysine  or  arginine  residues  at  the  N-terminus  (observed 
in  almost  equal  numbers).  This  reflects  the  well  documented  poor  exopeptidase  efficiency 
of  trypsin  (Allen,  1989).  The  second  largest  group  (approx.  25%)  arises  from  the  inhibition 
of  cleavage  where  the  cleavage  site  is  followed  by  one  of  the  large,  hydrophobic  residues  L, 
I,  F  or  V,  where  steric  hindrance  slows  the  absolute  rate  of  cleavage.  Trypsin  also  seems  to 
be  adversely  influenced  by  the  close  proximity  of  acidic  residues  (E  or  D).  The  eight  residues 
(E,  D,  L,  I,  F,  V,  K,  and  R)  are  thus  associated  with  88%  of  all  observed  incomplete  cleavages, 
with  the  remaining  13  common  amino  acids  present  in  only  12%  of  cases. 

HPLC  Purification  of  Recovered  Peptides 

The  digest  method  was  evolved  principally  to  allow  rapid  screening  of  multiple 
samples  by  peptide-mass  fingerprinting  using  MALD  mass  spectroscopy.  In  cases  where 
analysis  of  small  aliquots  of  the  digest  supernatant  failed  to  identify  the  protein,  the 
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Relative  Frequency 

Figure  2.  Observed  frequencies  of  residues  adjacent  to  potential  cleavage  sites  that  result  in  incomplete 
(partial)  cleavage. 


remaining  bulk  of  the  peptides  can  be  purified  by  narrow-  or  microbore  reverse-phase  HPLC 
for  conventional  Edman  micro  sequencing  . 

An  example  of  a  typical  HPLC  purification  is  shown  in  Figure  3  A. 

Approximately  1 5  pmol  of  a  protein  with  apparent  Mw  of  80,000  Da  (SDS  PAGE) 
was  blotted,  stained  with  sulforhodamine  B  and  digested  on  the  PVDF  membrane  as 
described.  Peptide  fingerprint  analysis  of  the  tryptic  fragments  (Figure  4)  identified  the 
protein  as  human  C23  nucleolin.  This  analysis  was  performed  using  only  2%  of  the  total 
digest  sample  (0.5  pi  sampled  from  25  pi  total  volume). 

The  remaining  bulk  of  the  digest  supernatant  was  collected  (approx.  23  pi),  dried  in 
vacuo,  redissolved  in  20  pi  0.5  %  v/v  HFBA  and  injected  onto  an  Aquapore  RP-300  (C8) 
column  (2.1  mm  x  10  cm)  equilibrated  with  0.025%  v/v  HFBA.  No  attempts  were  made  to 
re-extract  peptides  from  the  residual  PVDF  membrane  pieces  by  repeated  solvent  washes. 
Peptides  were  eluted  with  a  gradient  of  acetonitrile  containing  0.05%  v/v  TFA  (2-80%  over 
80  minutes  at  0.2  ml/min).  The  presence  of  the  residual  1%  octyl  glucoside  minimises 
non-specific  adsorption  of  peptide  onto  the  walls  of  the  tube  and  aids  solubilisation.  On 
injection,  the  detergent  does  not  retain  significantly  and  elutes  at  the  column  void  (V).  The 
presence  of  HFBA  is  very  important  at  this  stage  to  retain  small  peptides  by  acting  as  a 
hydrophobic  ion-pairing  agent.  Without  HFBA,  small  peptides  are  eluted  by  the  residual 
detergent.  As  the  concentration  of  acetonitrile  rises  the  HFBA  counterion  is  replaced  by  TFA, 
reducing  the  overall  hydrophobicity  of  bound  peptides.  The  concentration  of  TFA  in  the 
eluting  acetonitrile  (at  0.05%  v/v)  is  sufficient  to  balance  the  UV  absorbance  at  214-220  nm. 
As  noted  by  Fernandez  et  al.  (1994)  the  HPLC  background  profiles  are  significantly 
improved  by  the  absence  of  PVP-40,  particularly  when  working  at  high  sensitivity.  The  only 
major  contaminating  peak,  eluting  at  approximately  40%  ACN,  is  the  residual  sulforho¬ 
damine  B  dye  (SR).  A  number  of  peptides  were  collected  and  subjected  to  solid-phase 
microsequence  analysis  as  described  in  methods  (sequences  shown  in  Fig.  3 A)  with  initial 
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Figure  3.  HPLC  profile  of  peptides  derived  by  on-membrane  digest  of  15  pmol  human  C23  nucleolin 
(experimental  conditions  described  in  the  text).  (3A)  peptides  recovered  by  passive  elution  into  formic  acid 
and  ethanol  added  post-digest.  (3B)  repeat  elution  gradient  of  washings  of  the  residual  membrane  pieces  with 
aq.  propanol  and  TFA.  (V)  represents  the  column  void,  (SR)  the  excess  sulforhodamine  B  stain  and  (B)  denotes 
those  peaks  present  in  blank  gradients.  5  peaks  were  collected  and  sequenced  by  soiid-phase  microsequencing 
(see  text). 
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Figure  4.  MALD  time-of- flight  mass  spectrum  of  the  tryptic  peptides  derived  by  on-membrane  digest  of  15 
pmol  human  C23  nucleolin.  The  spectrum  was  obtained  using  0.5  pi  of  the  digest  supernatant,  corresponding 
to  approximately  2%  of  the  total  sample. 


yields  of  between  3  and  6  pmol.  All  peptide  sequences  corresponded  exactly  to  sequences 
reported  for  human  nucleolin  C23  (Srivastava  et  aL,  1990). 

The  remaining  membrane  pieces  were  then  washed  with  100  pi  aliquots  of  70%  v/v 
isopropanol/0. 1%  v/v  TFA  (twice)  and  100  pi  aliquots  of  TFA  (twice).  The  pooled  washings 
were  dried,  redissolved  in  20  pi  1%  v/v  HFBA  and  chromatographed  under  identical 
conditions  (Figure  3B).  Apart  from  a  small  number  of  background  peaks  present  in  control 
blank  HPLC  gradients  (labelled  B)  and  a  small  amount  of  residual  stain  (SR),  the  profile 
shows  that  85-90%  of  the  peptides  had  already  been  recovered  from  the  membrane  by  passive 
elution  into  the  added  formic  acid:  ethanol  solution.  When  handling  large  numbers  of  samples 
(several  hundred)  it  is  thus  not  necessary  to  exhaustively  wash  the  membrane  pieces  by 
repeated  solvent  extraction.  The  results  clearly  demonstrate  that  the  PVDF/octyl  glucoside 
digestion  can  be  used  as  a  one-step  procedure  for  the  preparation  of  peptides  suitable  for 
analysis  by  MALD  mass  spectroscopy  as  well  as  for  high-sensitivity  Edman  microsequenc¬ 
ing. 

Peptide  Mapping  of  Cellular  Proteins 

Over  the  previous  12  months  we  have  been  using  the  described  digestion  procedure 
to  analyse  cellular  proteins  resolved  by  large- format  2D  gel  electrophoresis.  Davison  and 
Davison  (1994)  have  used  the  method  to  identify  proteins  by  SDS  PAGE  of  channel  catfish 
virus  (a  member  of  the  herpesvirus  family).  The  study  has  led  to  successful  identification  of 
16  principal  virion  or  capsid  proteins  encoded  by  12  genes  (11  viral  and  one  from  the  host 
cell).  The  identification  of  viral  proteins  in  this  manner  is  particularly  apt  in  that  complete 
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genome  information  (and  hence  all  possible  protein  coding  regions)  are  available  for  many 
viruses  under  study.  Corbett  et  al.  (1994)  have  used  the  procedure  to  identify  more  than  50 
human  myocardial  proteins  resolved  by  2D  gel  electrophoresis  of  human  heart  tissue  (a  few 
examples  of  which  are  shown  in  Table  I).  In  our  own  laboratory,  we  have  been  analysing 
proteins  resolved  by  2D  PAGE  of  whole  mouse  brain  as  part  of  a  large-scale  genetic  mapping 
study  (Vogel  and  Klose,  1992).  Some  of  the  identified  proteins  are  shown  in  Table  I,  all 
identified  using  single  spots  from  only  one  gel  experiment.  Our  study  is  continuing,  with 
more  than  240  individual  protein  spots  analysed  to  date.  Several  features  of  the  preliminary 
data  are  worth  discussion. 

Estimation  of  protein  amount  is  difficult.  From  the  observed  intensity  of  the  sulfor- 
hodamine  stain  and  signal  intensities  recorded  during  MS  analysis,  sample  amounts  have 
ranged  from  tens  of  picomoles  to  less  than  100  femtomoles  before  enzyme  digestion.  Rough 
estimates  can  be  made  by  comparing  the  observed  peptide  signal  intensities  relative  to  the 
standard  loading  of  insulin  B  chain  used  as  internal  calibrant  (see  Methods).  From  our  work 
to  date  we  are  reasonably  confident  that  useful  spectra  can  be  obtained  from  most  of  the 
proteins  revealed  by  staining  with  the  sulforhodamine  dye.  In  some  cases,  the  spots  were 
only  visible  when  illuminated  with  short-wave  UV  light  to  fully  exploit  the  fluorescent 
properties  of  the  stain  (Coull  and  Pappin,  1 990).  This  is  well  below  the  level  at  which  proteins 
can  be  visualised  by  staining  with  Coomassie  blue.  Proteins  successfully  identified  included 
creatine  kinase  B-chain,  gamma  enolase,  and  lactate  dehydrogenase  H-chain  (all  brain 
isoforms).  This  latter  identification  was  of  particular  interest  in  that  identity  was  confirmed 
by  subsequent  mapping  of  the  spot  to  a  precise  genetic  locus  (J.  Klose,  personal  communi¬ 
cation).  One  surprising  feature  that  emerged  was  the  number  of  identified  proteins  that 
existed  in  more  than  one  isoform.  For  example,  the  gamma  enolase  protein  was  identified 
in  at  least  six  separate  forms  over  a  Mw  range  of  35-40  kDa  and  approximately  1  pH  unit  in 
the  lEF  dimension.  This  was  later  confirmed  by  the  finding  that  all  the  relevant  protein  spots 
again  mapped  to  the  same  genetic  locus.  More  intriguingly,  several  forms  of  the  creatine 
kinase  B-chain  were  observed  ranging  in  apparent  Mw  from  28-45  kDa  and  nearly  2  units 
of  pH.  More  detailed  analysis  is  required  to  fully  determine  the  cause  of  such  heterogeneity. 

Other  features  noted  are  as  follows.  The  use  of  Promega  modified  trypsin  is  particu¬ 
larly  advantageous  in  that  autolytic  digestion  products  are  minor  and  do  not  generally 
interfere  with  the  peptide  spectra.  Methionine  containing  peptides  have  been  frequently 
observed  in  the  oxidised  form  as  the  sulphoxide  (+16  Da),  the  ratio  of  native/oxidised  forms 
being  variable  and  generally  dependent  on  the  age  and/or  mistreatment  of  blotted  samples. 
In  many  instances  we  have  observed  cysteine-containing  peptides  as  the  beta-propionamide 
adduct  (+71  Da)  following  reaction  with  unpolymerised  acrylamide  (Brune,  1992).  Such 
modifications  can  be  essentially  quantitative  after  2D  gel  electrophoresis  if  no  effort  is  made 
to  maintain  reducing  conditions.  We  have  observed,  however,  that  inclusion  of  thiol  scaven¬ 
gers  in  both  dimensions  can  maintain  cysteine  residues  as  the  free  thiol  (see  Methods).  One 
final  observation  is  that  arginine  containing  peptides  are  more  likely  to  be  observed  at 
extreme  sensitivity,  presumably  reflecting  the  much  greater  basicity  of  the  guanidine 
side-chain  relative  to  a  primary  amine,  with  significant  enhancement  of  ionisation  efficiency. 
We  are  currently  experimenting  with  modification  of  peptides  with  quaternary  ammonium 
groups  to  further  improve  effective  sensitivity  in  the  presence  of  residual  buffer,  detergent 
and  stain  (Bartlet- Jones  et  al.,  1994). 


CONCLUSIONS 

We  have  developed  the  PVDF/octyl  glucoside  procedure  for  the  rapid  analysis  of 
cellular  proteins  by  peptide-mass  fingerprinting.  The  method  is  sensitive  enough  to  work  in 
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the  sub-picomole  range,  and  sufficient  material  can  be  derived  from  a  single  2D  gel 
separation.  The  procedure  is  now  in  routine  use  in  a  number  of  large-scale  analytical  projects. 
These  studies  have  confirmed  that  the  analysis  of  cellular  proteins  by  peptide-mass  finger¬ 
printing  can  provide  a  rapid,  direct  link  between  protein  and  DNA  information  and  may 
supersede  the  use  of  Edman  sequencing  for  these  and  related  projects. 
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SUMMARY 

One  hundred  picomoles  of  a  high  mobility  group  (  HMG)  protein  were  isolated  by 
reverse  phase  hplc  from  an  extract  of  pea  nuclei.  Amino  acid  sequence  and  composition 
analysis  of  half  the  sample  showed  the  protein  to  be  blocked  at  the  N-terminus.  Of  the 
remaining  material,  40pmol  was  subjected  to  digestion  with  proteinases  and  the  peptides 
from  the  tryptic  digest  were  separated  on  reverse  phase  hplc.  Each  eluted  peak  was  examined 
by  matrix-assisted  laser  desorption  time-of-flight  (MALDITOF)  mass  spectrometry  and 
amino  acid  sequence  analysis.  From  the  resulting  information,  it  was  clear  that  the  target 
protein  sequence  correlated  with  the  inferred  sequence  of  a  previously  isolated  pea  leaf 
cDNA  encoding  an  HMG-I-like  protein.  The  expressed  protein  was  smaller  than  the  DNA 
sequence  suggested.  From  a  series  of  further  digests  on  l-2pmol  of  protein,  the  likely  identity 
of  the  N-terminal  block  was  established  as  well  as  several  sites  of  C-terminal  processing. 
This  works  illustrates  how  extensive  amounts  of  data  can  be  derived  from  a  small  amount 
of  protein  by  the  combined  use  of  sequence  analysis  and  MALDITOF  mass  spectrometry. 


INTRODUCTION 

High  Mobility  Group  (HMG)  proteins  are  abundant  non-histone  proteins  associated 
with  eucaryotic  chromatin  (Johns,  1 982;  Bustin  et  al.,  1990).  Their  precise  function  is  unclear 
but  they  probably  play  a  role  in  regulating  gene  expression  by  affecting  the  conformation  of 
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chromatin.  HMG  proteins  are  small  (<30kDa),  acid-soluble,  with  strongly  acidic  and  basic 
domains.  As  a  group,  they  are  subject  to  a  variety  of  post-translational  modifications 
(glycosylation,  phosphorylation).  In  this  study,  structural  features  of  an  HMG  protein  from 
pea  leaf  nuclei  are  elucidated  through  mass  and  sequence  analysis. 


METHODS 

Preparation  of  HMG  Proteins 

HMG  protein  was  prepared  from  pea  shoots.  Chromatin  was  isolated  from  homoge¬ 
nised  tissue  by  differential  centrifugation  and  the  bound  HMG  proteins  were  removed  by  a 
salt  wash  with  350mM  NaCl  (Pwee  et  al.,  1994).  Unwanted  proteins  were  removed  by 
precipitation  with  2%  TCA;  the  soluble  (HMG)  proteins  were  recovered  by  acetone  precipi¬ 
tation.  Final  purification  of  the  HMG  proteins  was  by  reverse  phase  hplc  (Aquapore  C4, 
2.1x30mm)  in  0.1%TFA  with  a  gradient  of  acetonitrile/0. 1  %TFA.  The  position  of  HMG 
proteins  in  the  eluate  was  determined  by  gel-retardation  assays  against  an  isolated  fragment 
of  DNA  (268bp  region  of  the  pea  plastocyanin  promoter)  (Pwee  et  al,  1994). 

MALDITOF  Mass  Spectrometry 

Mass  spectrometer  -  MALDI  III  (Kratos)  operated  in  linear  mode. 

Matrix  -  a-cyano,4-hydroxycinnamic  acid  (Sigma),  lOmg/ml  in  50%  ethanol, 
0. 1  %TFA,  made  fresh  each  day. 

Sample  preparation  -  0.2 pi  sample  was  mixed  on-slide  with  0.5 pi  matrix  and  air 
dried.  Each  sample  spot  was  then  washed  briefly  with  water  to  remove  salts  which  may 
suppress  the  ionisation,  and  air  dried  again. 

Calibration  (external)  of  the  appropriate  mass  ranges  was  done  with  a  series  of 
custom-made  synthetic  peptides  755.9  -  5961.9Da,  and  horse  heart  myoglobin  (16952Da). 


Proteolysis 

Sequence-grade  proteinases  (Trypsin,  Lys-C,  Glu-C)  were  from  Boehringer.  Diges¬ 
tion  was  at  3TC  for  0.5-3h,  lOOmM  NH4HC03;using  a  substrate;proteinase  ratio  of  approx. 
50:1.  Samples  were  incubated  in  a  closed  microcentrifuge  tube  (0.4ml)  or  in  a  microdigestion 
reactor  (see  below). 

Microdigestion  Reactor 

As  part  of  this  work,  a  quick  and  easy  method  was  developed  to  allow  digestions  to 
occur  in  less  than  Ipl  of  solution.  A  sheet  of  laboratory  sealing  film  (e.g.  Nescofilm),  which 
has  a  non-wettable  but  mouldable  surface,  was  placed  on  a  metal  plate.  Protein  solution, 
buffer  and  proteinases  were  mixed  together  on  the  surface  in  sub-pl  amounts  using  a  pipette 
tip.  The  centre  of  a  screw-cap  top  from  a  1.4ml  tube  was  filled  with  paper  tissue  and 
dampened  with  water.  Excess  water  was  blotted  away  and  the  cap  was  centred  over  the 
digestion  mixture  and  pressed  firmly  into  the  film  surface,  creating  a  small  humidity  chamber 
(Fig.  1).  The  metal  plate  was  placed  into  an  oven  or  hot  room  at  37®C  room  for  up  to  3  hours. 
(A  hot-block  cannot  be  used  because  distillation  of  the  sample  solvent  into  the  cooler  tissue 
occurs.) 

The  sample  was  allowed  to  cool  to  room  temperature;  when  the  cap  was  removed, 
the  digest  was  sampled  for  mass  spectrometry.  The  same  protocol  can  be  done  directly  on 
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Water-soaked  tissue 


Metal  plate 


Figure  1.  Microdigestion  reactor  design  for  the  digestion  of  sub-microlitre  quantities  of  protein. 


the  sample  slide  using  a  gasket  of  film  but  this  is  more  manipulative  and  prevents  the  use  of 
adjacent  sample  positions. 


RESULTS 

A  purification  profile  of  HMG  proteins  by  reverse  phase  hplc  is  shown  in  Fig  2. 

Analysis  of  the  arrowed  protein  (single  band  on  SDS  gels,  1  OOpmol  by  amino  acid 
analysis)  by  MALDITOF  mass  spectrometry  revealed  a  heterogeneous  population  (Fig.  3) 
with  masses  ranging  from  16750-17150Da.  Mass  analysis  was  unsuccessful  on  the  neat 
fraction;  the  sample  had  to  be  concentrated  1  Ox  under  vacuum  to  attain  a  satisfactory  signal 
and  2-3pmol  of  protein  was  required.  Mass  analysis  of  a  similarly  sized  standard  protein 
(horse  heart  myoglobin)  gave  a  homogeneous  sharp  peak,  suggesting  that  the  observed 
heterogeneity  in  the  HMG  protein  was  real,  and  not  an  artefact  of  the  analysis  procedure. 
Such  heterogeneity  could  result  from  ragged  N-  or  C-terminal  processing  (if  present)  and/or 
incomplete  post- translational  modification  at  one  or  more  sites. 

Amino  Acid  Sequence  Analysis 

Sequence  analysis  of  SOpmol  of  the  sample  gave  no  N-terminal  sequence.  Quantita¬ 
tion  of  the  material  on  the  sequencer  disc  confirmed  the  protein  to  be  N-blocked.  An  attempt 
to  release  a  free  N-terminus  through  “on-disc”  cyanogen  bromide  digestion  of  the  sequenced 


Figure  2.  Reverse  phase  hplc  purification  of  HMG  protein  from  pea  nuclei. 
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Figure  3.  Mass  analysis  of  intact  purified  HMG  protein,  demonstrating  the  heterogeneity  of  molecular  size. 
Calibration  was  performed  using  myoglobin  (lower  panel). 
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material  was  unsuccessful.  Trypsin  digestion  of  a  small  (l-2pmol)  sample  of  material  and 
scanning  of  the  masses  of  the  fragments  through  the  OWL  database  using  the  MOWSE 
program  (Pappin  et  al.,  1993)  showed  the  HMG  protein  sequence  to  be  absent.  It  was 
therefore  necessary  to  commit  a  substantial  amount  of  the  remaining  material  to  fragmenta¬ 
tion  for  internal  sequence  analysis. 

Proteolysis 

40pmol  of  HMG  protein  was  digested  in  lOpl  of  buffer  with  trypsin  and  the  peptides 
separated  by  reverse  phase  hplc  on  Spherisorb  ODS2  Cl 8,  Spm,  1x1 00mm,  0.075ml/min  in 
0. 1%TFA  with  acetonitrile  gradient  (data  not  shown).  Fractions  (70pl)  were  analysed  directly 
by  MALDITOF  mass  spectrometry  (0.2pl)  and  sequence  analysis  (50-60|al).  The  amino  acid 
sequencing  clearly  identified  the  protein  as  being  the  product  of  a  pea  leaf  cDNA  encoding 
an  HMG-I-like  protein  (C.  Webster  and  J.  Gray,  unpublished  data).  The  mass  analysis 
matched  the  expected  length  of  each  peptide  identified  from  the  sequence  data,  inferred  from 
the  DNA  (Fig.  4)  but  no  N-terminal  fragment  was  evident.  The  mass  of  the  protein  from  the 
DNA  sequence  (20470Da)  exceeded  considerably  that  measured  for  the  intact  protein 
(approx.  17kDa),  showing  this  protein  to  be  C-terminally  processed. 

Further  digests  were  performed  on  small  amounts  of  material  with  a  view  to 
identifying,  by  mass-mapping,  other  segments  of  the  protein  sequence.  Approx.  2pmol  of 
HMG  protein  was  subjected  to  digestion  with  Lys-C  proteinase  in  2|al  of  buffer.  A  sample 
(0.2 pi)  was  examined  by  MALDITOF  mass  spectrometry  and  the  masses  were  scanned 
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Figure  4.  Mass-mapping  of  peptides  against  the  DNA-derived  sequence  of  HMG  protein.  Peptides  were 
identified  from  their  masses  (error  tolerance  ±2Da).  Edman  sequencing  was  performed  on  the  tryptic  peptides 
only;  in  two  cases  (residues  22->  and  141^),  partial  sequence  data  were  obtained  but  no  mass  information 
was  forthcoming. 
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Figure  5.  Mass  analysis  of  a  GluC  digest  of  HMG  protein. 


against  the  expected  protein  sequence.  All  significant  peaks  could  be  matched  and  several 
derived  from  the  C-terminal  area  of  the  protein.  While  these  data  were  not  supported  by 
sequence  analysis,  the  fact  that  they  matched  overlapping  areas  of  the  sequence  indicated 
that  these  assignments  were  probably  correct  (Fig.  4), 

The  experiment  was  then  repeated,  digesting  2pmol  of  HMG  protein  with  Glu-C 
proteinase;  it  was  clear  that  not  all  the  fragments  could  be  matched  with  the  protein  sequence 
(Fig.  5).  Two  major  peaks  (1781 .7±0.9Da  and  1598.7±0.8Da)  could  not  be  identified  and, 
since  a  fragment  corresponding  to  Val6-Glul7  -  close  to  the  N-terminus  -  was  also  present, 
there  was  the  hope  that  perhaps  the  Glu5-Val6  cleavage  had  been  incomplete  and  that  one 
of  the  unidentified  fragments  represented  the  full  length  N-terminal  peptide.  If  this  was  so, 
the  1598.7Da  peptide  would  have  to  correspond  to  Glu5-Glul7  with  an  unknown  adduct  of 
1 1 7Da  attached  to  an  N-terminus.  A  more  reasonable  hypothesis  emerged  from  the  1 78 1 .7Da 
peak.  This  could  be  explained  by  a  peptide  Thr3-Glul7,  leaving  an  extra  mass  of  42.5Da  to 
be  accounted  for  -  this  is  consistent  with  N-terminal  acetylation  (+42Da)  of  the  Thr.  It  was 
therefore  likely  that  the  1598.7Da  peak  derived  from  another  segment  of  the  sequence,  but 
as  it  was  unmatched,  it  could  represent  a  side-chain  post-translationally  modified  peptide. 
Further  work  will  be  done  to  establish  its  identity. 

Further  evidence  for  an  acetyl-Thr  N-terminus  came  from  treatment  of  the  Glu-C 
digest  with  Lys-C  proteinase.  This  caused  a  loss  of  the  unmatched  fragments  and  the 
appearance  of  a  new  peptide  of  786.95±0.2Da  (Fig.  6).  The  mass  difference  between  this 
and  the  1781.7Da  peptide  (994.75±0.9)  should  generate  a  peptide  of  mass  1012.75±0.9Da. 
Given  the  protease  specificities,  the  only  candidate  stretch  of  sequence  with  a  similar  mass 
and  which  bridges  a  Lys  and  a  Glu  residue  is  Pro9  -  Glul7,  mass  1012.9Da;  a  peak  for  this 
does  not  appear  but  its  sequence  (PLSLPPYPE)  suggests  it  may  be  insoluble.  The  786.95Da 
fragment  must  therefore  be  N-terminal  to  Pro9  and  represent  some  altered  form  of  the 
sequence.  Again  this  can  only  be  explained  by  acetyl-TREVNK  (expected  mass  787.8Da). 
Further  work  will  be  done  to  confirm  this  assignment  by  enzymological  and  chemical  means. 
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Figure  6.  Mass  analysis  the  GIuC  digest  of  HMG  protein  after  further  digestion  with  LysC  proteinase. 


DISCUSSION 

This  work  has  revealed  a  number  of  important  characteristics  of  the  HMG  protein. 
The  limited  sensitivity  of  amino  acid  sequence  analysis  required  the  use  of  the  bulk  of  the 
material  for  digestion  and  isolation  of  internal  peptides.  However,  a  significant  amount  of 
information  has  been  derived  from  microdigestion  of  only  a  few  pmoles  of  material.  In 
particular,  the  likely  nature  of  the  N-terminal  block  has  been  established  as  acetyl  on  Thr3, 
and  the  heterogeneity  seen  in  the  mass  analysis  of  the  whole  protein  is  supported  by 
identification  of  the  masses  of  several  candidate  C-terminal  peptides.  However,  matching 
the  observed  protein  mass  to  the  peptide  data,  it  is  clear  that  further  modifications  are  likely 
as  at  least  2  C-terminally  matched  peptides  fall  short  of  the  length  anticipated  from  the  whole 
mass  (arrowed,  Fig.  7).  Two  internal  peptides  have  a  tentative  assignment  (Fig.  4)  and  may 
represent  modified  forms  from  elsewhere  in  the  sequence  ,  but  even  for  identified  peptides, 
there  may  be  undetected  modified  forms  of  the  same  sequence  arising  from  incomplete 
post-translational  modification.  Covering  the  entire  sequence  by  the  mass-mapping  approach 
should  not  be  taken  as  an  indication  that  no  part  of  the  sequence  is  post-translationally 
modified.  So  far,  no  such  modifications  have  been  identified  conclusively,  but  different 
conditions  of  analysis  have  yet  to  be  investigated;  for  example,  any  highly  phosphorylated 
peptides  may  not  ionise  under  the  positive  ion  mode  use  in  this  work  and  therefore  remain 
undetected. 

In  this  work  it  has  become  apparent  that  the  use  of  several  proteinases  has  given 
valuable  information.  The  matching  of  a  particular  stretch  of  sequence  from  several  different 
digests  increases  the  confidence  not  only  that  the  assignment  is  correct,  but  also  that  the 
sequence  is  unlikely  to  be  present  in  a  modified  form.  All  the  expected  fragments  will  not 
be  recovered  from  any  one  proteinase  digest  so  the  judicious  use  of  several  proteinases,  either 
singly  or  in  sequence,  can  significantly  enhance  the  success  rate  of  identifying  all  regions 
of  the  protein.  Recent  advances  in  MALDITOF  mass  spectrometry,  such  as  high  sensitivity 
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ladder  sequencing  (Bartlet-Jones  et  al.,  1995)  and  post-source  decay  work  (Talbo  &  Mann, 
1994)  will  greatly  improve  the  confidence  of  assignments  made  through  the  mass-mapping 
approach  and  make  easier  the  detection  of  post- translational  modifications. 


CONCLUSIONS 

MALDITOF  mass  spectrometry  coupled  to  the  approach  of  microdigestion  enables 
a  large  amount  of  information  to  be  derived  from  a  few  pmol  of  protein.  Where  a  gene 
sequence  is  available,  the  ability  to  match  masses  with  the  protein  sequence,  coupled  with 
the  known  specificity  of  proteinase  digestion,  obviates  the  need  to  support  all  data  with 
sequence  analysis,  which  requires  much  more  material.  However,  the  identification  of 
modifications  is  not  so  straightforward  and  the  advent  of  sequence  analysis  on  reflector 
instruments  will  be  a  valuable  aid  in  this  type  of  analysis. 
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INTRODUCTION 

Many  complex  biological  processes  such  as  cell  growth  and  differentiation  are  under 
the  control  of  hormones,  some  of  which  act  through  membrane  bound  receptors  altering  the 
phosphorylation  state  of  the  proteins  involved  in  the  transformation,  regulating  their  activity. 
The  arrival  of  only  a  few  hormone  molecules  to  a  specific  plasma  membrane  receptor 
produces,  after  several  steps  of  amplification,  a  phosphorylation  of  a  few  billion  target 
molecules.  In  order  to  study  the  phosphorylation  states  of  the  target  proteins  and  to  follow 
the  steps  leading  to  their  phosphorylation,  an  extremely  high  resolution  separation  method 
coupled  with  an  equally  sensitive  detection  method  is  needed. 

Two  dimensional  gel  electrophoresis  as  described  by  OTarrel  in  1 975  ( 1 )  is  currently 
the  most  highly  resolving  and  simple  method  for  protein  separation,  allowing  up  to  3,000 
proteins  from  a  cell  extract  to  be  visualised  on  a  single  gel  by  silver  staining.  The  tremendous 
resolving  power  and  sensitivity  of  the  technique  and  the  ability  to  electroblot  proteins  to  an 
inert  support  for  antibody  detection  or  Edman  sequencing  for  protein  identification,  makes 
it  ideal  for  studying  cell  signalling  in  which  the  phosphorylation  state  of  tens  of  proteins 
must  be  followed  simultaneously.  Scanned  images  of  stained  2D-gels  (or  autoradiograms) 
of  cells  in  different  states  can  be  analysed  by  computer  and  the  changes  in  protein  expression 
and  phosphorylation  state  quantitated  (2), 

The  main  drawback  in  the  construction  of  the  2D  gel  maps  has  been  the  sensitivity 
of  the  methods  used  for  identifying  proteins.  This  has  improved  recently  with  the  develop¬ 
ment  of  peptide  mass  fingerprinting,  the  identification  of  a  protein  in  a  database  using  a  set 
of  molecular  masses  of  peptides  generated  by  a  specific  digestion.  This  method  was 
independently  developed  by  ourselves  and  four  other  groups  and  published  simultaneously 
in  1993  (3,  4,  5,  6,  7)  and  was  stressed  as  a  means  of  linking  two  dimensional  gel  databases 
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to  protein  databases.  The  acquisition  of  a  mass  fingerprint  takes  ca.  5  minutes  for  a  matrix 
assisted  laser  desorption  /  ionisation  time  of  flight  (MALDI-TOF)  mass  spectrometer  and 
30  min  for  a  capillary  HPLC  run  on  a  quadrupole  instrument,  and  requires  only  tens  of 
femtomoles  for  detection.  Since  upwards  of 200  or  more  proteins  can  be  isolated  in  sufficient 
quantities  for  digestion  (ca.  1 0  pmol)  from  a  single  experiment  (running  multiple  gels  and 
then  digestions  in  parallel),  mass  fingerprinting  is  an  excellent  complement  to  the  2D 
analysis,  providing  a  rapid  identification  of  known  proteins  and  unique  tags  for  unknowns. 
Thus  2D  electrophoresis  in  combination  with  mass  spectrometry  offers  a  systematic  ap¬ 
proach  to  the  study  of  kinase  cascades  through  the  construction  of  ’cell  maps’. 


PROTEIN  IDENTIFICATION  BY  MASS  MAPPING 

Proteins  can  be  identified  using  a  set  of  peptide  fragment  weights  produced  by  a  specific 
digestion  to  search  a  protein  database  in  which  sequences  have  been  replaced  by  a  list  of  the 
theoretical  masses  of  the  fragments  produced  by  that  cleavage  method.  Methods  have  been 
described  by  several  groups  for  searching  peptide  mass  databases  derived  from  protein 
databases.  The  search  methods  described  are  robust  for  digests  which  yield  accurate  masses 
(+/-  0.5  amu)  for  five  or  more  peptides  and  usually  yields  unequivocal  data.  Digests  which 
produce  only  a  few  peptides,  or  where  the  amount  of  material  is  so  low  that  mass  accuracy 
suffers,  can  produce  inconclusive  results,  as  can  proteins  which  are  not  in  the  database. 
However  the  major  drawback  using  protein  sequences,  or  protein  sequences  obtained  by 
autotranslation  from  the  cDNA  sequence  is  that  the  vast  amounts  of  data  being  generated  by  the 
genome  and  cDNA  sequencing  projects  is  being  left  untapped.  Computerised  extraction  of  the 
correct  reading  frame  of  genomic  DNA  sequences  is  possible  but  the  extraction  of  sequences 
is  not  always  complete  due  to  difficulties  such  as;  predicting  boundaries  for  small  exons/in- 
trons,  reading  frame  shifts,  the  occurrence  of  sequences  within  introns  of  one  protein  which 
code  for  another  protein,  amongst  others.  Potentially  the  most  useful  source  of  sequence 
information,  which  is  inaccessible  to  autotranslation,  is  the  rapidly  increasing  number  of 
Expressed  Sequence  Tags  (EST),  small  cDNA  sequences  obtained  from  random  primed  cDNA 
libraries  (8).  In  release  37  of  the  EMBL  database  there  are  over  4,000  such  sequences  present, 
coding  on  average  for  approximately  100-150  amino  acids.  Here  we  present  methods  using 
multi-dimensional  searches  which  greatly  increase  the  confidence  level  for  identification, 
allowing  DNA  sequence  databases  to  be  examined. 

One  established  MS  technique  which  can  greatly  increase  the  confidence  levels  in 
database  searching  is  Hydrogen-Deuterium  exchange.  This  has  already  been  used  for  simpli¬ 
fying  the  interpretation  of  MS/MS  spectra  for  peptide  sequencing  (9).  The  number  of  exchange¬ 
able  hydrogens  in  a  peptide  is  sequence  dependant,  so  peptides  with  similar  masses  may  be 
distinguished  after  exchange  (A,  F,  G,  I,  L,  M,  and  V  all  have  1  exchangeable  hydrogen;  C,  D, 
E,  H,  S,  T,  W,  and  Y  have  2;  K,  N,  and  Q  have  3  and  R  has  5).  A  second  method  for  generating 
orthogonal  data  is  to  combine  the  results  from  two  digestions  using  enzymes  or  chemicals  with 
different  cleavage  specificities.  The  effect  of  using  dual  digestions  and  deuterium  exchange  on 
the  certainty  of  the  data  obtained  from  a  search  are  shown  in  table  1 . 


CELL  SIGNALLING  BY  CALCIUM 

In  the  resting  state  the  cytoplasmic  calcium  concentration  is  of  the  order  of  100  nM, 
several  orders  of  magnitude  less  than  the  external  milieu  (ca.  3  mM,  see  figure  1).  Upon 
stimulation  by  an  effector  (such  as  a  hormone  or  an  electrical  signal),  a  calcium  spike  is 
generated  as  Ca^'^  is  released  from  the  endo/sarcoplasmic  reticulum  (ER/SR)  and  let  in  from 
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Table  1.  Orthogonal  data  increases  the  confidence  levels  of  searching.  Digestions  were  carried  out 
using  50  pmol  of  phenylalanine  ammonium  lyase  and  the  peptide  masses  were  accumulated  by 
on-line  HPLC-MS.  The  data  was  used  to  search  the  EMBL  database  using  the  program  MassSearch. 
The  results  give  a  score,  the  number  of  peptides  occurring  in  that  protein  between  the  lowest  and 
highest  masses  used  for  the  search  (n),  the  number  of  experimentally  determined  weights  which 
matched  (k),  the  accession  number  (AC)  and  a  description  of  the  entry  in  the  database  (DE) 


1.  Deuterated  tryptic  digestion  of  phenylalanine  ammonia  lyase 

Score  n  k  AC 

DE 

80.9 

18 

4 

X03237; 

Sheep  mRNA  for  alpha-Sl -casein.  Ovis  aries 

80.3 

39 

6 

X16772; 

Rcrispum  PAL-1  gene  for  phenylalanine  ammonia-lyase  exon  2, 
Petroselinum  crispum 

78.9 

12 

3 

K03355; 

HSVl  (KOS)  gene  for  2.7  kb  spliced  mRNA,  splice  acceptor 
region.  Herpes  simplex  virus  type  1 

77.4 

22 

4 

X59836; 

C.hircus  mRNA  for  as  1 -casein  Capra  hircus  (goat) 

74.9 

22 

3 

J02895; 

Pig  non-histone  chromosomal  protein  (HMG2)  mRNA,  Sus 
scrofa  (domestic  pig) 

74.1 

34 

3 

Xi7480; 

Chicken  mRNA  for  arylamine  N-acetyltransferase  (NAT-3) 
Callus  domesticus  (chicken) 

2.  Combined  search  using  deuterated  and  non-deuterated  digests 


Score 

n 

k 

n 

k 

AC 

DE 

162.0 

39 

5 

39 

6 

X16772; 

P.crispum  PAL-1  gene  for  phenylalanine  ammonia-lyase  exon  2 
Petroselinum  crispum 

146.9 

49 

6 

49 

5 

XI 7462; 

P.crispum  RNA  for  PAL4,  phenylalanine  ammonia-lyase 
Petroselinum  crispum 

134.7 

10 

2 

10 

3 

L14214; 

Human  chromosome  4  (clone  p4-1630)  STS4-1220.  Homo 
sapiens  (human) 

124.6 

39 

4 

39 

4 

L11747; 

Populus  tricocarpa  X  Populus  deltoides  (hybrid)  phenylalanine 
ammonia  lyase  (PAL) 

114.8 

13 

3 

13 

2 

V00567 

Human  messenger  RNA  fragment  for  the  beta-2  microglobulin. 
Homo  sapiens  (human) 

112.5 

4 

2 

4 

2 

M63179; 

Human  HLA-DR  beta  x  gene,  exon  2.  Homo  sapiens  (human) 

3.  Combined  search  using  tryptic  and  AspN  digestions 


Score 

n 

k 

n 

k 

AC 

DE 

133.1 

49 

6 

27 

3 

X17462; 

P.crispum  RNA  for  PAL4,  phenylalanine  ammonia-iyase. 
Petroselinum  crispum 

133.0 

39 

5 

17 

2 

X16772; 

P.crispum  PAL-1  gene  for  phenylalanine  ammonia-lyase  exon  2 
Petroselinum  crispum 

106.7 

30 

3 

11 

4 

M32778; 

S.frugiperda  insertion  element  IFP1.6  DNA,  clone  lambda  889. 
Spodoptera  frugiperda 

104.4 

32 

3 

12 

4 

M32777; 

S.frugiperda  insertion  element  IFP1.6  DNA,  clone  lambda  883. 
Spodoptera  frugiperda 

102.5 

31 

5 

6 

4 

D00511; 

Rat  mitochondrial  acetoacetyl-CoA  thiolase  mRNA.  Rattus 
norvegicus  (rat) 

101.8 

32 

3 

14 

4 

M32775; 

AcNPV  mutant  with  an  S.frugiperda  insertion  element  IFPL6. 

outside  of  the  cell  by  Ca^'*'  specific  channels,  raising  the  concentration  around  1 00  fold.  The 
main  transducer  of  this  signal  is  calmodulin  (CaM)  which  binds  Ca^'^  with  an  affinity  (Kd) 
of  lO'^M.  The  Ca^"^  signal  is  self  terminating,  the  rise  in  concentration  activates  the  plasma 
membrane  and  ER  Ca^'*'  pumps,  simultaneously  the  Ca^'*'  channels  close,  the  net  effect  being 
a  rapid  return  to  the  resting  Ca^'^  levels  in  the  cytosol. 


190 


M.  Quadroni  et  al. 


Ca  10'  M 


Figure  1.  A  simplified  scheme  of  calcium  homeostasis  in  the  cell. 


The  SR/ER  pump  transfers  Ca^'*'  out  of  the  cytosol  into  the  ER/SR.  It  is  regulated  by 
a  small  pentameric  5  kDa  membrane  protein  termed  phospholamban  which  at  low  Ca^'^ 
concentrations  binds  to  the  pump  inhibiting  it  by  physically  blocking  access  to  the  active 
site.  Raising  the  concentration  above  0.5  pM  and  /  or  phosphorylating  phospholamban 
with  cAMP  dependent  protein  kinase  or  CaM  dependent  protein  kinase  causes  a  dissociation 
of  the  complex  and  the  inhibition  is  released  (10).  The  plasma  membrane  pump  contains  a 
30  residue  stretch  of  amino  acids  near  the  C-terminal  of  the  pump  which  interacts  with  a 
region  close  to  the  active  site  inhibiting  the  pump.  The  binding  of  CaM  in  response  to 
increased  Ca^'^  levels  relieves  the  inhibition  (1 1).  This  internal  inhibition  can  also  be  removed 
in  other  ways  such  as  by  phosphorylation  of  the  binding  domain  with  protein  kinase  C,  or 
as  a  final  last  ditch  resort,  by  proteolytic  removal  of  the  CaM  binding  domain  by  the  Ca^"^ 
activated  protease  calpain  (as  is  the  case  in  ageing  cells). 


APPLICATION  OF  MS  METHODS  TO  CELL  SIGNALLING 

Stimulation  of  cells  by  bombesin  has  been  shown  to  produce  a  long  calcium  transient 
in  cultured  liver  cells.  An  initial  rapid  release  of  Ca^^  from  internal  stores  occurs  followed 
by  a  sustained  second  phase  >2  min  where  Ca^'*^  efflux  is  greatly  reduced.  Bombesin  is  known 
to  produce  a  increase  in  the  activity  of  several  kinase  including  Casein  Kinase  II  (CKII)  and 
non-receptor  tyrosine  kinases  (12).  These  observations  are  puzzling;  if  the  cell  calcium 
concentration  is  high,  why  is  CaM  not  binding  to  the  plasma  membrane  ATPase  and  lowering 
the  calcium  concentration?  A  clue  may  come  from  our  recent  finding  that  calmodulin  can  be 
found  in  a  phosphorylated  state  in  vivo.  Figure  2  shows  the  raw  and  deconvoluted  spectra 
of  rat  liver  CaM  analysed  by  HPLC-MS.  The  different  phosphorylation  states  can  be  seen 
as  peaks  separated  by  80  mass  units.  The  purified  phosphoCaM  behaved  in  an  entirely 
different  manner  than  normal  CaM.  It  shows  a  much  lower  affinity  for  the  plasma  membrane 
ATPase,  only  partially  activating  it  at  calcium  concentrations  seen  in  bombesin  stimulated 
cells.  PhosphoCaM  is  also  an  extremely  weak  activator  of  CaM  dependant  protein  kinase, 
which  is  unable  to  activate  the  SR  calcium  ATPase  by  phosphorylating  phospholamban.  This 
suggests  a  pivotal  role  of  CaM  phosphorylation  in  elongation  of  the  calcium  signal. 
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Figure  2.  Deconvoluted  mass  spectrum  of 
Phosphocalmodulin.  50  pmol  of  HPLC  puri¬ 
fied  phosphocalmodulin  was  infused  at  3 
pL/min  with  a  sheath  liquid  of  methoxyetha- 
nol  into  a  Finnigan  MAT  TSQ700  mass  spec¬ 
trometer.  The  spectrum  was  accumulated  for 
1  minute,  scanning  from  1000  to  2000  m/z  in 
3  seconds  and  deconvoluted  with  software 
supplied  by  the  manufacturer.  The  peaks  are 
labelled  relative  to  the  mass  of  non  phospho- 
rylated  calmodulin  and  show  clearly  the  pres¬ 
ence  of  0, 1, 2, 3  and  4  four  phosphates  (mass 
increment  80). 


The  next  question  was  obvious;  which  kinase  was  responsible  for  this  phosphoryla¬ 
tion?  In  order  to  get  an  idea,  we  decided  to  determine  the  sites  of  phosphorylation.  The 
phospho-protein  was  digested  with  CNBr  and  subsequently  trypsin,  (trypsin  alone  was 
insufficient  since  CaM  becomes  refractory  to  digestion  after  phosphorylation).  The  resultant 
peptide  mixture  was  separated  by  HPLC  and  the  effluent  analysed  by  automated  on-line 
MS/MS.  The  spectrum  of  the  singly  phosphorylated  tryptic  fragment  T8  and  also  the  relative 
ion  current  are  shown  in  figure  3  .  The  automated  MS/MS  HPLC  run  identified  the  three 
sites  of  modification  as  being  Ser79,  Thr  81  and  Seri  01.  These  sites  are  canonical  targets  of 


Figure  3.  Automated  MS/MS  of  a  CNBr/ 
Tryptic  digestion  of  Phosphocalmodulin.  100 
pmol  of  phosphocalmodulin  was  digested  in 
the  dark  under  argon  with  200  mM  CNBr. 
The  solution  was  diluted  with  water  and  lyo- 
philised  before  digestion  with  trypsin  over¬ 
night  at  37^  C.  The  digest  was  acidified  to  pH 
2.0  and  injected  onto  a  self  packed  C-18 
reverse  phase  column  (0.375  mm  X  10  cm) 
and  eluted  with  a  gradient  of  0.1%  TFA  in 
water  to  0.08%  TFA  in  70%  acetonitrile  over 
30  mins.  We  have  written  an  instrument  con¬ 
trol  language  program,  Rubber,  which  scans 
the  third  quadrupole  (in  the  presence  of  col- 
liosion  gas  but  no  collision  offset  voltage) 
and  selects  ions  intense  enough  for  sequen¬ 
cing.  The  instrument  then  switches  to 
MS/MS  mode  and  sets  the  collision  energy 
according  to  the  parent  mass  and  accumulates 
6  scans.  The  instrument  then  reverts  to  nor¬ 
mal  MS  mode  to  identify  further  candidates 
for  MS/MS.  The  figure  shows  the  ion  current 
in  the  lower  panel  and  the  MS/MS  accumu¬ 
lated  for  the  phosphopeptide  indicated  by  the 
arrow.  The  sequence  ions  are  shown  above 
the  spectra. 
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PhosphoCaM 


Figure  4.  Two  dimensional  gel  of  baby  ham¬ 
ster  kidney  cells  grown  in  culture.  The  box 
indicates  the  position  of  non,  mono  and  di 
phosphorylated  calmodulin. 


casein  kinase  IL  We  could  reproduce  the  same  pattern  of  phosphorylation  using  CaM 
phosphorylated  in  vitro  by  casein  kinase  II  from  rat  liver. 

In  order  to  trace  the  signalling  pathway  backwards  to  try  and  confirm  casein  kinase 
as  the  kinase  acting  in  vivo,  we  started  analysing  various  cell  types  in  culture,  stimulating 
them  with  a  variety  of  hormones  and  running  two  dimensional  gels  to  see  if  CaM  was 
becoming  phosphorylated  (see  figure  4).  If  CaM  was  being  phosphorylated  by  casein  kinase 
II,  then  a  second  spot  should  be  visible  in  the  autoradiograms,  that  of  the  beta  subunit  of 
casein  kinase,  since  this  kinase  undergoes  autophosphorylation  when  active.  By  taking  a 
series  of  ’snap  shots’,  -  two  dimensional  gels  at  various  times  after  stimulation  with  an 
effector  we  hope  to  be  able  to  trace  the  signalling  pathway  backwards  to  see  what  is 
stimulating  casein  kinase  II  in  the  cell.  Critical  to  these  experiments  is  the  rapidity  and 
sensitivity  of  mass  mapping  for  the  identification  of  proteins  from  two  dimensional  gels. 

Database  searching  from  a  mass  profile  is  offered  as  a  free  service  by  an  automatic 
server  at  the  ETH,  Zurich.  For  information,  send  an  electronic  message  to  the  address, 
cbrgOinf.ethz.ch  with  the  line:  help  mass  search,  or  help  all.  An  experimental  World  Wide 
Web  server  has  been  set  up  at  the  address  http://cbrg.inf.eth.ch  which  requires  a  client  with 
forms  capability. 
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INTRODUCTION 

Carbohydrate  composition  analysis  of  glycoproteins  is  analogous  to  the  amino  acid 
analysis  of  proteins  and,  therefore,  is  a  fundamental  part  of  glycobiology.  An  accurate 
composition  analysis  can  give  some  insights  into  the  type  and  extent  of  glycosylation  and 
probable  structures  of  oligosaccharides  present  in  an  unknown  glycoprotein,  based  on 
current  knowledge.  Obviously,  identification  and  quantitation  of  the  individual  monosac¬ 
charides  present  in  the  purified  oligosaccharides  is  a  pre-requisite  for  structural  determina¬ 
tion.  A  review  of  some  useful  methods  for  carbohydrate  composition  analysis  has  been 
published  recently  (Townsend,  1993).  Analysis  of  monosaccharides  and  oligosaccharides 
using  high  performance  anion  exchange  chromatography  with  pulsed  amperometric  detec¬ 
tion  (HPAE-PAD)  has  gained  some  popularity  in  recent  years,  since  this  technique  does  not 
involve  any  derivatization  prior  to  analysis.  But,  it  requires  its  own  hardware  components, 
such  as  the  system  for  carbohydrate  analysis  commercially  available  from  Dionex.  In 
addition,  PAD  is  not  a  truly  speeific  deteetor  for  carbohydrates  and  the  glycoprotein 
hydrolysate  containing  amino  acids,  peptides  and  thiol  groups  interfere  with  the  monosac¬ 
charide  analysis. 

Two  novel  reversed  phase  high  performance  liquid  chromatographic  (HPLC)  meth¬ 
ods  with  fluorescence  detection  are  described  in  this  report  for  the  complete  carbohydrate 
(monosaccharides  and  sialic  acids)  composition  analysis.  The  monosaccharides  and  the  sialic 
acids  were  labeled  using  very  simple  derivatization  techniques  to  yield  highly  stable 
fluoreseent  derivatives.  These  methods  have  been  validated  for  accuracy  and  reproducibility 
using  both  standards  and  the  glycoproteins.  The  composition  analysis  of  glycoproteins  by 
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these  methods  is  comparable  to  HPAE-PAD  and  are  without  any  known  complications  of 
interference  in  monosaccharide  analysis  unlike  HPAE-PAD. 


METHODS 

Hydrolysis  of  Glycoproteins 

Glycoproteins  5-50  pg  were  hydrolyzed  in  0.25-0.5  ml  of  20%  trifluoroacetic  acid 
in  1.6-ml  conical  screw  cap  freeze  vials  (polypropylene  with  ’O’  ring  seals,  Sigma)  at  100 
°C  for  6-7  hours  (Anumula,  1994).  The  caps  on  the  vials  were  further  secured  by  applying 
4-5  layers  of  Teflon  tape  in  order  to  prevent  any  accidental  evaporation  of  the  sample  during 
hydrolysis.  Samples  were  dried  overnight  using  a  vacuum  centrifuge  evaporator  (Savant) 
without  heat.  For  hexosamine  analysis  specifically,  the  glycoproteins  were  hydrolyzed  in 
0.05  -  0. 1  ml  of  4N  HCl  at  1 00  for  16  hours  and  dried  on  a  vacuum  centrifuge  evaporator. 

Derivatization  of  Monosaccharides  with  Anthranilic  Acid  (ABA) 

A  solution  of  4%  sodium  acetate  -3H2O  and  2%  boric  acid  in  methanol  was  prepared 
by  shaking  vigorously  in  a  graduated  cylinder  with  a  glass  stopper.  This  solution  may  be 
stored  at  room  temperature  for  several  months.  The  derivatization  reagent  was  prepared  by 
dissolving  30  mg  of  anthranilic  acid  (Aldrich)  and  20  mg  of  sodium  cyanoborohydride 
(Aldrich)  in  1.0  ml  of  the  methanol-sodium  acetate-borate  solution.  Dry  glycoprotein 
hydrolysates  were  dissolved  in  1%  fresh  sodium  acetate-3H20  (0.1 -0.2  ml)  and  an  aliquot 
(20-100  pi)  was  transferred  to  a  new  screw  cap  freeze  vial.  Samples  were  mixed  with  0. 1  ml 
of  the  anthranilic  acid  reagent  solution  and  capped  tightly.  The  vials  were  heated  at  80  °C  in 
an  oven  or  heating  block  (Reacti-Therm,  Pierce)  for  30-45  minutes  (Anumula,  1994).  After 
cooling  the  vials  to  ambient  temperature,  the  volume  of  the  samples  was  made  up  to  1 .0  ml 
with  HPLC  solvent  A  and  mixed  vigorously  on  Vortex  in  order  to  expel  the  hydrogen  evolved 
during  the  reaction.  Duplicate  injections  of  50  pi  were  made  from  each  vial  for  analysis. 
Similarly,  the  monosaccharide  standards  were  derivatized  to  contain  20-25  pmol  each  per 
injection  and  were  derivatized  each  time  for  the  unknown  sample  analysis. 

HPLC  Analysis  of  ABA-Monosaccharide  Derivatives 

ABA-monosaccharide  derivatives  of  the  monosaccharides  were  separated  on  a  C- 1 8 
reversed  phase  HPLC  column  (Bakerbond,  5  pm,  0.46  x  25  cm,  analytical,  J.T  Baker  or 
Beckman,  Ultrasphere-ODS,  0.46  x  25  cm)  using  a  1-butylamine-phosphoric  acid-tetrahy- 
drofuran  mobile  phase.  All  the  separations  were  carried  out  at  ambient  temperature  using  a 
flow  rate  of  1  ml/min.  Solvent  A  consisted  of  0. 1 5-0.3%  1  -butylamine,  0.5%  phosphoric  acid 
and  1%  tetrahydrofuran  (0.025%  BHT  inhibited,  Aldrich)  in  water  and  solvent  B  consisted 
of  equal  parts  of  solvent  A  and  acetonitrile.  The  solvent  A  contained  0.2%  butylamine  for 
the  Bakerbond  column  and  the  gradient  program  was  5%  B  isocratic  for  25  min  followed  by 
a  linear  increase  to  15%  B  at  50  min.  The  solvent  A  contained  0.15%  butylamine  for  the 
Beckman  column  and  the  gradient  program  was  5%  B  isocratic  for  30  min  followed  by  a 
linear  increase  to  15%  B  at  50  min.  The  column  was  washed  for  15  minutes  with  100%  B 
and  equilibrated  for  1 5  min  with  the  initial  conditions  to  ensure  reproducibility  from  run  to 
run.  ABA  derivatives  were  detected  with  a  HP  1046 A  HPLC  fluorescence  detector  using  230 
nm  excitation  and  425  nm  emission  (Anumula,  1994). 
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Determination  of  Sialic  Acids 

Fifty  [iL  of  the  sample  containing  0.01  to  0.25  mg  of  protein  was  mixed  with  50  pL 
of  0.5  M  sodium  bisulfate  in  a  1.6-ml  conical  screw  cap  freeze  vial.  After  placing  the  caps 
tightly  on  the  vials,  they  were  incubated  at  80  (Reacti-Therm  heating  module,  Pierce)  for 
20  minutes.  These  mild  acid  hydrolysis  conditions  were  satisfactory  for  the  release  of  sialic 
acids  from  the  glycoproteins. 

The  mild  acid  released  sialic  acids  were  then  derivatized  with  o-phenyle- 
nediamine  2HC1  (OPD,  Aldrich).  Sialic  acid  standard  solution  (1-2  nmol  in  0.1  ml)  in 
separate  1 .6-mL  screw  cap  freeze  vials  and  the  unknown  samples  from  mild  acid  hydrolysis 
were  mixed  with  0.1  mL  of  OPD  solution  (20  mg/ml  in  0.5  M  NaHS04).  After  placing  the 
caps  tightly  on  the  vials,  they  were  incubated  at  80  °C  (Reacti-Therm)  for  40  minutes.  After 
cooling  the  tubes  to  room  temperature,  they  were  diluted  to  1 .0  mL  with  solvent  A  and  mixed 
vigorously  on  a  Vortex  mixer.  The  vials  were  centrifuged  at  maximum  speed  for  3-5  minutes 
in  a  micro  centrifuge  in  order  to  clarify  the  solution  and  the  clear  supernatant  was  used  for 
the  analysis.  Sialic  acids  were  also  derivatized  with  4,5  dimethyl  1 ,2-phenylene  diamine  as 
described  earlier  (Anumula,  1994a). 

HPLC  Analysis  of  OPD-Sialic  Acid  Derivatives 

The  derivatized  sialic  acid  standards  and  the  samples  were  transferred  into  separate 
auto  injector  vials  and  two  injections  of  0.1  mL  were  made  from  each  vial.  The  sialic  acid 
derivative  was  separated  on  a  Cl 8  reversed  phase  column  (Ultrasphere-ODS,  Beckman)  at 
ambient  temperature  using  a  flow  rate  of  1 .0  mL  per  min.  The  HPLC  solvents  were  prepared 
as  described  for  the  monosaccharide  analysis.  Solvent  A  for  this  column  contained  0.15% 
butylamine.  The  sialic  acid  derivatives  were  eluted  with  13%  solvent  B  isocratic  hold  for  15 
min  followed  by  10  min  wash  with  95%  solvent  B  and  equilibration  with  13%  solvent  B  for 
10  min.  The  fluorescence  detector  conditions  were  the  same  as  described  earlier  for  the 
monosaccharide  analysis  (Anumula,  1994)  and  were  quantitated  using  an  230  nm  excitation 
and  a  425  nm  emission  wavelengths. 


CHARACTERIZATION  OF  GLYCOPROTEINS 

As  shown  in  Figure  1,  the  characterization  of  glycoproteins  is  a  rather  involved 
process  and  requires  a  considerable  amount  of  time  and  effort  to  determine  the  oligosaccha¬ 
ride  structures  specifically  associated  with  the  individual  sites  of  glycosylation.  The  structure 
determination  involves  identification  and  quantitation  of  individual  monosaccharides  in 
addition  to  the  use  of  various  physico-chemical  techniques  (see  mini  reviews  Kobata,  1992, 
Lee,  1992  and  Lee,  et  al,  1990).  Common  steps  involved  in  the  determination  of  oligosac¬ 
charide  structures  using  various  techniques  are  shown  in  Table  1 , 

It  is  not  the  scope  of  this  paper  to  review  all  the  methods  in  detail  and  experimental 
protocols  for  some  of  these  techniques  may  be  found  elsewhere  (Hounsell,  1993).  Notably, 
the  analysis  of  carbohydrates  by  capillary  electrophoresis  has  been  reviewed  recently 
(Novotny  and  Sudor  1993,  and  Oefner  and  Chiesa  1994).  Complete  release  of  oligosaccha¬ 
rides  (N-linked  and  0-linked)  from  glycoproteins  by  hydrazinolysis  is  the  method  of  choice 
(Patel,  et  al.  1993),  but  it  imposes  tremendous  challenges  in  the  fractionation  of  various 
oligosaccharides.  For  example,  a  mixture  of  high  mannose,  complex  and  0-linked  oligosac¬ 
charides  can  not  be  resolved  easily.  Therefore,  0-linked  and  high  mannose  oligosaccharides 
are  released  separately  using  specific  methods  prior  to  the  release  of  complex  oligosaccha¬ 
rides.  On  the  other  hand,  it  is  rare  to  find  a  glycoprotein  with  high  mannose,  complex  and 
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0-linked  oligosaccharides.  At  this  time,  HPAE-PAD  remains  as  the  method  of  choice  for  the 
fractionation  of  oligosaccharides  and  the  analysis  of  monosaccharides,  once  the  system 
operates  in  a  satisfactory  mode  (Anumula  and  Taylor,  1991).  Even  then,  the  monosaccharide 
analysis  of  certain  glycoproteins,  for  example  IgG,  can  be  problematic  due  to  severe 
interference  of  amino  acids  and  peptides  in  various  monosaccharides  (Anumula,  1994). 
However,  for  determining  the  oligosaccharide  map  of  glycoproteins,  HPAE-PAD  is  perhaps 
the  best  method  at  this  time.  A  tentative  identification  of  the  oligosaccharide  structures  can 
be  made  by  comparison  of  the  various  retention  times  with  those  of  the  reference  standard 
oligosaccharides  and  based  on  the  behavior  of  oligosaccharides  on  the  HPAE-PAD  column 
(Anumula  and  Taylor,  1991;  Lee,  1990;  and  Townsend  and  Hardy,  1990).  Obviously,  the 
confirmation  of  the  implied  structures  must  be  carried  out  by  independent  methods  as 
described  in  the  Table  1 . 


MONOSACCHARIDE  ANALYSIS 

With  the  intent  to  develop  simple  methods  with  ease  of  operation  using  common 
equipment  to  determine  the  carbohydrate  composition  of  glycoproteins,  two  new  methods 
with  fluorescence  detection  for  high  sensitivity  are  developed.  The  monosaccharides  are 


Quantitative  Determination  of  Monosaccharides  and  Sialic  Acids 


199 


Table  1.  Various  steps  involved  in  the  structure  determination  of  carbohydrates 

Steps  involved  Comments 


1 .  Release  of  sugar  chains  from  glycoproteins 

a.  Enzymatic 

PNGase  F,  PNGase  A,  Endo  Fs,  Endo  H 
0-glycanase 

b.  Chemical,  e.g.,  hydrazine,  NaOH/NaBH4 

2.  Isolation  of  oligosaccharides 

a.  HPLC  (normal  and  reversed  phase  and  ion 
exchange,  etc.) 

b.  Gel  permeation  chromatography 

c.  Lectin  chromatography 

d.  TLC 

3.  Fractionation/purification  of  oligosaccharides 

a.  HPAE-PAD 

b.  Bio-Gel  P-4  chromatography 

c.  HPLC  (normal  and  reversed  phase  and  ion 
exchange,  etc.) 

4.  Determination  of  sugar  composition 

a.  HPAE-PAD 

b.  GC/CE 

c.  HPLC-fluorescence  detection 

d.  HPLC-UV  detection 

e.  MS 

5.  Inter  sugar  linkage  determination 

a.  Methyl ation  analysis  by  GC-MS 

b.  NMR 

c.  MS  (where  applicable) 


Have  specificities  for  peptides  and  oligosaccharides 
Cleaves  only  the  neutral  0-linked  disaccharides 
Degradation  products  may  be  formed 


Interference  with  monosacchardies  in  hydrolysates 


Single  best  method 

Structural  and  conformational  information  with  large 

amounts 

Not  widely  used 


Requires  derivatization 

Not  as  sensitive  as  fluorescence 

Does  not  identify  monosaccharides 


Simple  and  inexpensive  technique 

No  derivatization 
Derivatization  for  sensitivity 


6.  Sequential  exo/endo  glycosidase  digestions  and  product  analysis  for  anomeric  sequence  determination 

a.  HPAE-PAD  Change  in  retention  time 

b.  Bio-Gel  P-4  chromatography  Change  in  elution  volume 

c.  HPLC  methods  Change  in  retention  time 

d.  MS  Mass  change  in  the  oligosaccharide 

7.  Specific  chemical  degradations  (where  applicable) 

a.  Periodate  oxidation  Cleaves  between  vicinal  hydroxyls 

b.  Acetolysis  Cleaves  at  1-6  glycosidic  bond 

c.  Nitrous  acid  deamination  Cleaves  at  sugar  amines 


derivatized  with  2-aminobenzoic  acid  (ABA,  anthranilic  acid)  and  the  sialic  acids  are 
derivatized  with  o-phenylenediamine  2HCL  (OPD)  as  shown  in  Schemes  1  and  2. 

Detection  of  the  sugar  derivatives  by  fluorescence  is  the  most  sensitive  method  of 
quantitation  and  with  these  fluorescent  tags  ~0. 1  pmol  of  the  monosaccharides  and  <2  pmol 
of  the  sialic  acids  can  be  easily  determined.  Furthermore,  these  methods  use  the  same  solvent 
systems  and  the  detector  conditions  for  convenience.  Although  the  same  column  can  be  used 
for  both  the  monosaccharide  and  the  sialic  acid  determinations,  it  is  advisable  to  use  a 
Brownlee  Spheri-5  RP-18  cartridge  (0.46  x  10  cm)  for  common  sialic  acid  estimation  in 
order  to  save  time  in  cleaning  the  column  (4-5  blank  runs)  previously  used  for  the  monosac¬ 
charide  analysis. 
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^^C-CH-CH-CH-CH-CH20H 

H 


1 


COOH 


Scheme  1.  Derivatization  of  carbohydrates  with  2-aininoben2oic  acid  (ABA,  anthranilic  acid)  using  galactose 
as  an  example. 


Analysis  of  Monosaccharides  as  ABA  Derivatives 

Both  neutral  and  amino  sugar  residues  can  be  derivatized  with  2-amino  benzoic  acid 
(ABA)  in  the  presence  of  cyanoborohydride  to  yield  highly  fluorescent  stable  derivatives 
for  quantitative  determination.  A  mixture  of  standard  monosaccharides  consisting  of  glu¬ 
cosamine,  galactosamine,  galactose,  mannose  and  fucose  was  derivatized  to  give  20-25 
pmols  per  injection.  The  monosaccharide  derivatives  were  completely  separated  on  the  C- 1 8 
Bakerbond  column  using  the  butylamine-phosphoric  acid  system  (Anumula,  1994).  Tetra- 
hydrofuran  (inhibited)  was  used  to  improve  the  resolution  of  the  sugar  derivatives.  Separa¬ 
tion  of  the  amino  sugars  from  the  excess  reagent  depends  on  the  starting  mobile  phase 
composition  and  in  particular  the  ratio  of  butylamine  to  phosphoric  acid.  Typical  chromato¬ 
grams  obtained  with  C18-Bakerbond  and  Ultrasphere-ODS  columns  are  shown  in  Figure  2. 

Only  the  Ultrasphere-ODS  column  was  suitable  for  analyzing  both  monosaccharides 
and  sialic  acids  using  the  same  solvent  systems.  Optimum  conditions  for  derivatization  and 
HPLC  separation  were  the  same  as  described  earlier  (Anumula,  1994).  The  results  obtained 
using  these  methods  for  the  complete  carbohydrate  analyses  of  the  recombinant  IgG 
expressed  in  CHO  cells  is  shown  in  Table  2. 
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Scheme  2.  Derivatization  of  sialic  acids  with  o-phenylenediamine  (OPD)  using  N-acetylneuraminic  acid  as 
an  example. 
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Figure  2.  Typical  chromatograms  obtained  with  ultrasphere-ODS  and  C-18  Bakerbond  columns. 


Table  2.  Complete  carbohydrate  composition  of  a 
rIgG  determined  by  the  fluorescence  methods 


Monosaccharide 

rIgG 

HPLC" 

Mol/MoF 

rIgG 

Corrected^^ 

Mol/Mol" 

Glucosamine 

6.64 

7.95  (8.0)'* 

Galactose 

1.20 

1.44(1.47) 

Mannose 

5.02 

6.01  (6.0) 

Fucose 

1.85 

2.20  (2.0) 

Sialic  acid 

0.10" 

0.07*“ 

^Values  obtained  by  HPLC  analysis. 

^Values  corrected  for  the  recovery  of  monosaccharides 
from  20%  TFA  at  100°C  for  7  hours  of  hydrolysis. 
Recovery  of  the  monosaccharides  was  83.5%  of  the 
expected  values  for  well  characterized  glycoproteins. 

‘^Calculated  based  on  polypeptide  molecular  weight  of 
146,273  Da. 

‘^Expected  values  based  on  the  oligosaccharide  map  for 
rIgG. 

^Determined  by  a  modified  thirobarbituric  acid  assay 
procedure. 

‘^Determined  by  the  OPD  HPLC  fluorescence  method. 
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Figure  3.  HPAE-PAD  oligosaccharide  map 
of  a  CHO  cell  derived  recombinant  IgG.  The 
map  of  hydrazine  released  oligosaccharides 
was  determined  as  described  earlier  (Anu¬ 
mula,  1994).  ISA  and  OSA  indicate  where 
the  oligosaccharides  with  one  sialic  acid 
and  without  any  sialic  acid  typically  elute. 
Hydrazinolysis  artifact  peak  is  indicated  by 
an  asterisk.  For  assigned  structures  see  Fig¬ 
ure  4. 


As  indicated  in  Table  1,  the  composition  was  in  good  agreement  with  that  of  what  is 
expected  from  the  oligosaccharide  map  and  the  associated  structures,  following  the  correc¬ 
tion  for  the  degradation  of  sugar  residues  during  hydrolysis.  The  oligosaccharide  map  of  the 
rIgG  and  the  associated  structures  are  shown  in  Figures  3  and  4. 

The  expected  carbohydrate  composition  for  the  rIgG  was  calculated  from  the  relative 
ratio  of  oligosaccharide  peaks  and  their  structures.  The  oligosaccharide  map  of  the  rIgG 
consisted  >95%  of  neutral  complex  bi-antennary  oligosaccharides  with  peripheral  heteroge¬ 
neity.  Oligosaccharide  structures  were  determined  by  HPAE-PAD  following  digestions  with 
exoglycosidases  and  the  specificity  of  endo-P-N-acetylglucosaminidases  Fs  (Anumula, 
1993).  HPAE-PAD  responses  for  these  oligosaccharides  were  assumed  equal  in  calculating 
the  expected  carbohydrate  composition.  Atypical  recovery  of  monosaccharides  was  between 
83-85%  for  a  number  of  glycoproteins  examined  so  far  and  was  used  in  determining  the 
actual  composition  of  the  glycoproteins.  For  the  first  time  an  accurate  estimation  for  the  loss 
in  recovery  of  the  monosaccharides  was  determined  using  the  predicted  composition  from 
the  oligosaccharide  map  of  a  recombinant  IgG  produced  in  CHO  cell  line.  The  data  for 
reproducibility  with  the  monosaccharide  standards  and  the  rIgG  are  shown  in  Tables  3  and  4. 

The  procedures  were  highly  reproducible  with  relative  standard  deviation  of  less  than 
3.5%.  The  method  for  monosaccharide  analysis  reported  here  is  superior  compared  to  the 
methods  using  either  2-amino  pyridine  (Hase,  1993)  or  4-amino  benzoic  acid  ethyl  ester 
(Kwon  and  Kim,  1993)  for  derivatizations,  since  the  current  procedure  does  not  involve  the 
separation  of  the  excess  reagent  from  the  derivatives  either  by  extraction  or  by  gel  filtration. 


Table  3.  Reproducibility  of  monosaccharide  standards 
(25  pmol  ea.)  derivatized  with  anthranilic  acid 


HPLC 

Peak  area  xlO  ^ 

Glucosamine 

Galactose 

Mannose 

Fucose 

Injection  1 

11.15 

3.954 

4.177 

5.474 

Injection  2 

11.11 

3.913 

4.140 

5.474 

Injection  3 

11.28 

3.995 

4.211 

5.575 

Injection  4 

11.24 

3.967 

4.225 

5.529 

Injection  5 

11.16 

3.971 

4.195 

^481 

Injection  6 

11.21 

3.996 

4.180 

5.500 

Mean  area 

11.19 

3.966 

4.188 

5.505 

%RSD 

0.55 

0.77 

0.71 

0.72 
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Figure  4.  Oligosaccharide  structures  assigned  to  the  oligosaccharide  map.  SA,  sialic  acid;  F,  fucose;  Bi, 
bi-antennary  complex  type;  Gn,  N-acetylglucosamine;  G,  galactose  and  CR,  tri-mannosyl  chitobiose  core. 


Table  4.  Reproducibility  of  rlgG  monosaccharides  derivatized 
with  anthranilic  acid 


HPLC 

Peak  area  x  1 0  ^ 

Glucosamine 
46  pmol 

Galactose 
6.3  pmol 

Mannose 
40.8  pmol 

Fucose 

11.5  pmol 

Injection  1 

20.75 

0.975 

6.100 

2.413 

Injection  2 

20.80 

0.963 

6.143 

2.407 

Injection  3 

20.39 

0.978 

5.774 

2.211 

Injection  4 

20.60 

0.948 

5.878 

2.257 

Injection  5 

20.34 

0.927 

5.893 

2.371 

Injection  6 

20.71 

0.940 

6.068 

2.439 

Mean  area 

20.60 

0.955 

5.976 

2.350 

%RSD 

0.95 

2.13 

2.47 

3.25 
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Retention  Time  (min) 

Figure  5.  RP-HPLC  chromatograms  obtained  with  OPD-N-acetylneuraminic  acid  and  the  sialic  acid  from 
rIgG. 


SIALIC  ACID  ANALYSIS 

Determination  of  Sialic  Acids  as  Quinoxaline  Derivatives 

Since  the  sialic  acids  could  not  be  derivatized  with  ABA,  they  were  derivatized  with 
either  DMPD  (Anumula,  1994)  or  OPD  to  yield  stable  fluorescent  quinoxaline  derivatives 
with  similar  excitation  and  emission  maximums  to  that  of  the  monosaccharides.  Experience 
with  these  derivatives  indicated  that  the  OPD  derivatives  are  more  amenable  to  the  chroma¬ 
tographic  procedures  than  the  DMPD  derivatives  (Anumula,  1994a).  The  sialic  acid  deriva¬ 
tives  were  easily  separated  on  the  same  column  using  the  same  eluents  and  quantitated  using 
the  same  detector  settings.  Optimum  condition  for  mild  hydrolysis  and  derivatization  at  80 
were  20  and  40  min,  respectively.  Typical  chromatograms  of  the  N-acetyl  neuraminic 
acid-OPD  and  sialic  acid  derivatives  obtained  from  the  rIgG  are  shown  in  Figure  5. 

Less  than  2  pmols  of  sialic  acid  can  be  easily  determined  by  these  two  different  tags. 
It  should  be  noted  that  in  the  case  of  the  rIgG  with  ~2.0%  carbohydrate  and  ~95%  neutral 
oligosaccharides,  the  sialic  acid  content  can  be  easily  determined  by  this  method.  Data  for 
reproducibility  of  the  standard  and  the  rIgG  sample  are  shown  in  Table  5. 

The  relative  standard  deviation  for  the  N-acetylneuraminic  acid  from  the  rIgG  was 
less  than  3%  at  ~6  pmol  level. 
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Table  5.  Reproducibility  of  sialic  acid  standard  and 
rIgG  sample  derivatized  with  OPD 


Peak  area  xlO  ^ 

HPLC 

Standard 

100  pmol 

rIgG 

6.32  pmol 

Injection  1 

47.72 

2.949 

Injection  2 

47.98 

2.989 

Injection  3 

47.54 

3.019 

Injection  4 

48.10 

3.117 

Injection  5 

47.94 

3.149 

Injection  6 

48.51 

2.981 

Mean  area 

47.96 

3.034 

%RSD 

0.70 

2.65 

SUMMARY 

Monosaccharide  composition  analysis  is  the  fundamental  part  of  glycobiology.  In 
this  regard,  two  novel  and  simple  high  performance  liquid  chromatographic  (HPLC)  methods 
with  fluorescence  detection,  for  the  determination  of  neutral  and  amino  sugar  residues  and 
for  the  determination  of  sialic  acids  in  glycoproteins  are  described. 

Following  acid  hydrolysis  of  the  glycoproteins  in  20%  trifluoroacetic  acid,  the 
released  monosaccharides  were  labeled  by  reductive  amination  with  anthranilic  acid  (2- 
amino  benzoic  acid)  in  the  presence  of  sodium  cyanoborohydride  and  the  derivatives  were 
separated  from  each  other  and  from  the  excess  reagent.  The  method  is  suitable  for  quantita¬ 
tive  determination  of  less  than  100  fmols  of  monosaccharides. 

Sialic  acids  were  released  from  glycoproteins  by  mild  acid  (NaHS04)  hydrolysis  and 
were  derivatized  with  commercial  o-phenylenediamine.2HCL  (OPD)  to  obtain  stable  fluo¬ 
rescent  quinoxaline  derivatives.  OPD-silaic  acid  derivatives  were  separated  on  Cl  8  reversed 
phase  column  using  the  same  solvent  systems  used  for  monosaccharide  analysis.  Fluores¬ 
cence  properties  of  these  derivatives  were  similar  to  that  of  anthranilic  acid  derivatives  and 
they  were  quantitated  using  the  same  excitation  and  emission  wavelengths. 

These  two  methods  use  the  same  C-18  column,  mobile  phase  and  detector  wave¬ 
lengths  of  230  nm  and  430  nm  for  excitation  and  emission,  respectively,  but  changing  the 
column  for  sialic  acid  analysis  is  recommended  in  order  to  save  time.  Both  methods  are 
highly  reproducible  with  relatively  low  limits  (3-4%)  of  relative  standard  deviations,  since 
the  procedures  do  not  involve  separation  of  excess  reagents  from  the  derivatives.  Therefore, 
the  fluorescence  methods  reported  here  provide  a  convenient  and  efficient  means  of  deter¬ 
mining  both  monosaccharides  and  sialic  acids  in  glycoproteins  with  high  sensitivity.  For  the 
first  time  the  recovery  of  monosaccharides  from  the  hydrolysates  was  determined  using  the 
oligosaccharide  map  and  the  recovery  was  from  83-85%  for  most  of  the  glycoproteins 
examined. 
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INTRODUCTION 

C9  is  the  last  acting  protein  in  the  complement  cascade.  Upon  interaction  with  the 
membrane  bound  precursor  complex,  composed  of  C5b,  C6,  C7,  and  C8,  C9  undergoes  a 
conformational  change  from  a  monomeric  globular  plasma  protein  into  an  oligomeric 
integral  membrane  protein,  thus  generating  an  amphiphilic  channel  across  the  lipid  bilayer 
of  the  target  cell. 

Mature  C9  is  a  single  chain  glycoprotein  of  an  apparent  molecular  mass  of  about  71 
kDa.  The  complete  sequence  has  been  determined  by  cDNA  analysis  (DiScipio  et  ai,  1984; 
Stanley  et  ai,  1985;  Marazziti  et  aL,  1988).  C9  contains  538  amino  acids,  24  of  which  are 
half-cystines.  Most  of  them  occur  in  clusters  that  have  been  designated  as  thrombospondin 
type  I  repeat  (TSP  I),  low  density  lipoprotein  receptor  class  A  (LDL  A)  module  and  low 
density  lipoprotein  receptor  class  B  (LDL  B)  or  epidermal  growth  factor-like  domain. 

C9  shows  partial  sequence  similarites  to  the  other  terminal  complement  proteins 
(Haefliger  et  ai,  1989;  DiScipio  and  Hugh,  1989).  All  the  structural  motifs  found  in  C9  are 
also  present  in  variable  number  in  C6,  C7,  C8a,  and  C8p. 


EXPERIMENTAL  PROCEDURES 
Isolation 

C9  was  isolated  from  human  plasma  according  to  published  procedures  (Biesecker 
and  Mueller-Eberhard,  1980;  Dankert  et  al.,  1985),  which  involve  precipitation  with  BaCl2 
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Figure  1 .  Schematic  representation  of  the  cleavage  strategy  applied  for  the  identification  of  the  disulfide  bonds 
of  human  C9. 


and  polyethylene  glycol,  plasminogen  depletion  on  lysine-Bio-Gel,  and  chromatography  on 
DEAE-Sephacel  and  hydroxlapatite.  The  purity  was  checked  by  SDS-PAGE,  RP-HPLC,  and 
amino  acid  analysis. 


Fragmentation 

Cleavage  with  cyanogen  bromide  was  carried  out  for  48  h  in  70%  trifluoroacetic  acid 
in  the  dark  at  a  reagent: substrate  ratio  of  5: 1  (w/w).  BNPS-skatole  was  used  in  a  5-fold  excess 
(w/w)  for  72  h  in  60%  acetic  acid,  containing  0.2  M  phenol. 

Enzymatic  digestions  were  generally  performed  at  an  enzymeisubstrate  ratio  of 
1:50,  in  the  presence  of  1  mM  iodoacetamide  to  avoid  disulfide  interchange.  Cleavage 
with  elastase  was  achieved  in  0.2  M  Tris-HCl,  pH  8.8,  for  96  h.  Thermolysinolysis  was 
performed  in  10  mM  borate  buffer,  pH  6.5,  containing  50  mM  NaCl  and  2  mM  CaCl2, 
for  96  h.  Subdigestion  with  V8-protease  was  carried  out  in  50  mM  sodium  phosphate 
buffer,  pH  7.8,  at  37  °C  during  74  h.  Subdigestion  with  subtilisin  was  done  in  10  mM 
NH4HCO3,  pH  8.0,  at  37  °C  for  30  h.  Subdigestion  with  pepsin  was  performed  in  10  mM 
HCl  at  37  °C  for  40  h. 
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Preparation  of  Fragments 

Fragments  generated  by  cleavage  with  BNPS-skatole  and  cyanogen  bromide  were 
separated  by  gel  filtration  on  a  Sephadex  G-50  superfine  column  (1.8  x  90  cm)  in  0.13  M 
formic  acid.  Enzymatic  digests  were  fractionated  by  RP-HPLC  using  a  Bakerbond  Butyl 
column  (4.6  x  250  mm,  wide  pore  33  nm,  5  pm;  J.T.  Baker,  Chemicals,  Deventer,  the 
Netherlands)  in  a  Hewlett-Packard  1090  liquid  chromatograph  (Hewlett  Packard,  Wald- 
bronn,  FRG).  The  acetonitrile  gradient  system  used  is  described  in  Fig.  3.  Final  purification 
was  achieved  on  Aquapore  Butyl,  Phenyl,  or  Octadecyl  columns  (2.1  x  100  mm,  7  pm, 
Brownlee  Columns,  Foster  City,  USA)  or  on  an  Aquapore  RP-300  column  (1.0  x  100  mm, 
7  pm). 

Identification  of  Cystine-Containing  Peptides 

Aliquots  of  each  HPLC-fraction  were  reduced  with  tri-n-butylphosphine  and  the 
generated  thiol  groups  specifically  labeled  with  ammonium  7-fluorobenz-oxa-l,3-diazole- 
4-sulfonate  (SBD-F)  (Sueyoshi  et  al.,  1985).  The  resulting  fluorescence  intensities  were 
measured  with  excitation  at  385  nm  and  emission  at  5 15  nm. 

Alternatively  the  cystine  content  was  monitored  by  means  of  amino  acid  analysis, 
using  gas  phase  hydrolysis  with  6  M  HCl  containing  0.1  %  (v/v)  phenol  for  24  h  at 
115  °C  under  vacuum  (Chang  and  Knecht,  1991).  The  liberated  amino  acids  were 
reacted  with  PITC  and  the  PTC-derivatives  analyzed  by  RP-HPLC  on  a  Nova  Pak 
Octadecyl  column  (3.9  x  150  mm;  4  pm;  Waters,  Milford,  USA)  (Bidlingmeyer  et 
al,  1984). 

Sequence  analysis  was  carried  out  in  a  pulsed-liquid-phase  sequenator  477A 
from  Applied  Biosystems.  PTH-amino  acids  were  analyzed  on-line  according  to  in¬ 
structions  of  the  manufacturer.  Thereby  it  is  essential  to  use  only  DTT-free  reagents 
and  solvents.  Di-PTH-cystine  was  detected  as  a  double  peak  in  the  vicinity  of  PTH- 
tyrosine. 


RESULTS  AND  DISCUSSION 

Generation  of  Cystine-Containing  Peptides 

Native  C9  was  cleaved  with  BNPS-skatole,  thermolysin,  elastase,  or  cyanogen 
bromide.  Selected  cyanogen  bromide  fragment  pools  were  subdigested  with  pepsin,  sub- 
tilisin,  V8-protease,  or  thermolysin  according  to  Fig.  1. 

The  generated  peptides  were  separated  either  by  gel  filtration  or  RP-HPLC  and  all 
fractions  examined  for  cystine  content  as  described  above.  Positive  fractions  were  usually 
rechromatographed  once  or  twice  on  different  columns. 

The  chromatograms  in  Fig.  2  and  3  illustrate  the  strategy  that  led  to  the  identification 
of  several  disulfide  bonds  primarily  in  the  amino  terminal  half  of  C9: 

20  mg  of  native  C9  were  cleaved  with  cyanogen  bromide.  Upon  gelfiltration  on  a 
Sephadex  G-50  superfine  column  the  cyanogen  bromide  digest  yielded  60  fractions.  SDS- 
PAGE,  amino  acid  analysis,  and  sequence  analysis  indicated  that  the  fractions  23  to  26 
consisted  mainly  of  a  fragment  ranging  from  Gln^  to  Met272,  whereas  fraction  30  essentially 
contained  a  chain  extending  from  Lys464  to  Lys538  (Fig.  2).  Fractions  23  to  26  and  fraction 
30  were  subsequently  treated  with  subtilisin  as  described  above.  The  digests  were  then 
separated  by  RP-HPLC  on  a  Bakerbond  Butyl  column  and  the  fractions  checked  for  cystine 
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Figure  2.  Separation  of  cyanogen  bromide  fragments  of  native  C9.  Gel  filtration  on  Sephadex  G-50  superfine 
(1.8  X  90  cm)  using  0. 13  M  formic  acid  as  eluant.  Flow  rate:  10  ml/h;  fraction  size:  5  ml;  detection:  absorbance 
at  280  nm.  Pools  corresponding  to  fractions  23  to  26  and  to  fraction  30  are  shaded. 
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Figure  3.  Partial  separation  of  the  subtilisin  digest  of  the  pooled  fractions  23  to  26.  RP-HPLC  on  Bakerbond 
Butyl  (wide  pore,  33  nm;  5  pm;  4.6  x  250  mm)  using  a  linear  acetonitrile  gradient  (0  -  35  %  B  in  50  min). 
Solution  A:  0.1  %  TFA  in  H2O;  solution  B:  0.1  %  TFA  in  80  %  acetonitrile.  Flow  rate:  1  ml/min;  detection: 
absorbance  at  210  nm.  Pools  containing  disulfide-paired  peptides  are  marked  with  arrows  and  the  assigned 
disulfide  bond  indicated. 
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content  (Fig.  3).  Further  purification  of  the  Cys-containing  peptides  was  achieved  on 
Aquapore  columns  (chromatograms  not  shown). 

Identification  of  the  Cystine-Containing  Peptides 

The  Cys-containing  peptides  were  first  characterized  by  amino  acid  analysis,  thus 
allowing  a  preliminary  localization  of  the  fragments  within  the  known  polypeptide  chain  of 
C9. 

Final  identification  of  the  disulfide  bond  was  achieved  by  amino  acid  sequence 
analysis.  The  corresponding  results  are  summarized  in  Table  1 . 

Assignment  of  the  Individual  Disulfide  Bonds  in  C9 

On  the  basis  of  the  Edman  degradation  data  the  disulfide  bridges  in  C9  were  assigned 
as  follows  (Fig.  4): 

The  terminal  thrombospondin  type  I  repeat  (TSP  I)  contains  six  half-cystine  residues, 
which  are  joined  in  a  1  to  4, 2  to  3,  and  5  to  6  pattern  (Cys22-Cys57,  Cys33-Cys36,  Cys67-Cys73). 

The  pairing  of  the  six  cysteines  in  the  adjacent  low  density  lipoprotein  receptor  class 
A  module  (LDL  A)  is  not  yet  fully  secured.  On  the  basis  of  sequence  comparisons  with 
complement  factor  I,  however,  a  link  between  Cysgo  and  Cysgj  was  postulated  (Sim  et  al, 
1993).  Moreover  compositional  analyses  suggest  that  Cys86  is  bonded  with  CySi^.  These 
findings  taken  together  would  then  indicate  a  linkage  between  Cys98  and  Cysio4,  thus 
implying  an  overall  1  to  3,  2  to  6,  and  4  to  5  arrangement. 

The  comparatively  heterologous  central  part  of  the  molecule,  containing  six  cysteine 
residues,  has  already  been  investigated  earlier.  Protease  digestion  experiments  indicated  that 
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Figure  4.  Disulfide  bond  pairing  of  human  C9.  TSP  I:  thrombospondin  type  I  module;  LDL  A;  low  density 
lipoprotein  receptor  class  A  module;  LDL  B:  low  density  lipoprotein  receptor  class  B  module  (epidermal 
growth  factor  precursor  module).  Definitively  assigned  disulfide  bridges  are  indicated  by  solid  lines;  tenta¬ 
tively  assigned  disulfide  bridges  are  indicated  by  dashed  lines. 
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Table  1.  Edman  degradation  data  and  identification  of  disulfide  bonds  in  human  C9 


Fragmentation  Subfragmentation  Structural  data  Disulfide  bond 


Thermolysin 


^^AVGDRRQCVPTEPCEDA  67-73 


Thermolysin 


^^^SEPRPPCR 


121-160 


Elastase 


^^CVPTEPCEDA 


67-73 


BNPS-Skatole 


31 


n 

SQCDPCLR* 


Cyanogen  Pepsin 
Bromide 


^^^DCRCL 

^^^NKDDCV 


Cyanogen  V8-Protease  ^^SSGSASHIDCR 

Bromide  I 

^^VFGQFNGKRCT 

Cyanogen  V8-Protease  ^  ^^SEPRPPCRDR 

Bromide  -  I 

I^^FYNGLCNRDR 


Cyanogen  V8-Protease 
Bromide 


Cyanogen  Thermolysin 

Bromide 


484 


FSVRKCHTCQ 


503 


GKC  L  CACPF 


515 


GIACE 


20 


IDCR 


52. 


FNGKRCTDA 


33-36 


359-384 


22-57 


121-160 


489-505 

492-507 

509-518 


22-57 


33 

Cyanogen  Subtilisin  '^CDPCLR  33-36 

Bromide 

Cyanogen  Subtilisin  ^^^EQCCEET  233-234 

Bromide 

Cyanogen  Subtilisin  ^^HIDC  22-57 

Bromide  I 

57ct 
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Table  1.  Continued 


Cyanogen  Subtilisin  ^^^SEPRPPC  121-160 

Bromide  | 

^^^CNRDRDG 


67 

Cyanogen  Subtilisin  CVPTEPCED  67-73 

Bromide 


Cyanogen 

Subtilisin 

33  ^ 
■^"^CDPCL 

Bromide 

33-36 


Cyanogen  Subtilisin 

Bromide 


115 


SEPRPPCRD 


158 


GLC 


121-160 


Cyanogen  Subtilisin 
Bromide 


507 


CACPF 


516 


lAC 


509-518 


Only  amino  acids  that  were  unambigously  identified  are  shown.  Asteriks  indicate  chain  ends 
that  have  not  been  rigorously  identified. 

CyS]2i  is  linked  to  CySigo,  CyS233  to  CyS234,  and  Cys359  to  Cys384  (Stanley  e/  al,  1985).  These 
results  are  fully  confirmed  by  the  sequence  analysis  data  shown  above.  Thereby  the  very 
unusual  vicinal  pairing  of  Cys233  Cys234  is  noteworthy. 

The  six  cysteine  residues  of  the  low  density  lipoprotein  receptor  class  B  module 
(LDL  B)  at  the  carboxyl  terminus  of  C9  are  connected  in  a  1  to  3,  2  to  4,  and  5  to  6 
pattern  (Cys489-Cys505,  Cys492-Cys507,  Cys509-Cys5i8).  These  findings  are  entirely  com¬ 
patible  with  the  arrangement  inferred  by  analogy  to  the  known  disulfide  linkages  in  the 
epidermal  growth  factor  (Savage  et  al,  1973)  and  complement  component  Cls  (Hess  et 
al,  1991). 

Comparison  with  the  other  Components  of  the  Membrane  Attack 
Complex 

Complement  component  C9  is  partly  similiar  to  C6,  C7,  C8a,  and  C8p,  with  cysteine 
as  the  most  conserved  of  all  amino  acids  (Haefliger  et  al.,  1989;  DiScipio  and  Hugh,  1 989). 
This  is  reflected  by  a  similiar  molecular  architecture  of  the  terminal  complement  proteins. 
As  depictet  in  Fig.  5  all  the  late  acting  complement  components  are  based  on  the  same  set 
of  modules.  C9  in  particular  contains  three  types  of  modules  that  are  omnipresent  in  the  other 
compounds. 

The  aim  of  this  study  was  to  establish  the  cystine  pattern  of  human  complement 
component  C9.  Due  to  common  structural  features  the  disulfide-bonding  pattern  of 
C9,  however,  can  help  to  predict  cystine  linkages  in  the  other  terminal  complement 
proteins. 
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Figure  5.  Structural  organization  of  the  terminal  complement  proteins.  The  scheme  shows  the  distribution  of 
the  cysteine-rich  modules  in  human  C6,  C7,  C8a,  C8p,  and  C9  (Haefliger  et  al,  1989;  DiScipio  and  Hugli, 
1989).  TSP  I:  thrombospondin  type  I  module;  A:  low  density  lipoprotein  receptor  class  A  module;  B:  low 
density  lipoprotein  receptor  class  B  module  (epidermal  growth  factor  precursor  module);  SCR  1,  SCR  2:  short 
consensus  repeat  (sushi  domain);  FIM  1 ,  FIM  2:  factor  I  module.  Cystines  found  outside  modules  are  indicated 
by  solid  lines. 
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INTRODUCTION 

Phosphorylation  of  proteins  is  one  of  the  most  frequent  forms  of  posttranslational 
modification  in  eukaryotic  cells  and  is  linked  to  the  control  of  a  multitude  of  cellular 
functions.  Proteins  involved  in  this  type  of  regulation  are  typically  only  phosphorylated  at 
a  single  or  a  few  sites.  Another  type  of  phosphoproteins  are  those  containing  multiple 
phosphorylations.  In  these  proteins  the  phosphorylations  usually  possess  different  functions 
than  in  proteins  phosphorylated  at  single  sites.  In  the  case  of  multisite  phosphorylated 
proteins  (for  review  see  Roach,  1991),  the  phosphorylations  are  often  important  as  physical 
interactors  with  divalent  metal  ions,  especially  Ca^"^. 

Multiphosphorylated  proteins  are  important  factors  in  all  biologically  regulated 
mineralization  processes.  They  are  hypothesized  to  be  involved  in  the  nucleation  of  mineral 
crystals  within  and  upon  the  organic  matrix  of  tissues  such  as  bone  and  dentin.  When  crystal 
growth  has  been  induced,  the  multiphosphorylated  proteins  are  thought  to  regulate  the  rate 
of  mineralization  by  adsorption  to  the  surface  of  the  crystals. 

Besides  calcifying  tissue,  multiphosphorylated  proteins  are  especially  abundant  in 
physiological  fluids  containing  high  Ca^"^  concentrations  such  as  saliva,  urine  and  milk.  In 
saliva  and  urine,  the  phosphoproteins  are  speculated  to  form  complexes  with  developing 
calcium  salt  crystals,  thereby  inhibiting  the  formation  of  dentin  plaques  (Holt  and  van 
Kemenade,  1989)  and  urinary  stones  (Shiraga  et  ah,  1992),  respectively. 
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The  significance  of  the  position  of  the  phosphorylations  in  these  multiphosphorylated 
proteins  is  unknown.  It  is  likely  that  the  effect  of  having  phosphoserine  residues  at  specific 
locations  in  the  protein  is  important  in  the  regulation  of  crystallization. 

The  inability  of  the  standard  Edman  degradation  procedure  to  deal  with  phos- 
phoamino  acids  has  made  the  localization  of  in  vivo  phosphorylation  sites  a  laborious 
procedure.  Subfragmentation,  purification  and  amino  acid  analysis  of  smaller  peptides  are 
necessary  if  a  phosphopeptide  contains  more  than  one  serine  residue  and  this  is  an  even  more 
tedious  challenge  if  two  or  more  serines  are  in  series. 

A  method  to  overcome  these  difficulties  has  been  reported.  Meyer  et  al.  (1986)  have 
described  a  one-step  micro  batch  reaction  in  which  peptide-bound  phosphoserine  is  quanti¬ 
tatively  converted  into  S-ethylcysteine,  while  serines  without  0-linked  phosphate  are  left 
unaffected.  Subjected  to  Edman  degradation  chemistry,  S-ethylcysteine  yields  a  stable 
PTH-derivative  which  elutes  between  valine  and  DPTU  in  the  system  used  (Applied 
Biosystems  477A/120A),  thereby  making  it  possible  to  assign  phosphoserines  directly  by 
automated  sequencing.  This  technique  has  been  employed  in  the  localization  of  in  vivo 
phosphorylation  sites  in  several  proteins  containing  one  or  few  phosphorylations  (for  review 
see  Meyer  et  al.,  1993).  Here  we  demonstrate  that  the  method  is  also  applicable  to  the 
localization  of  phosphorylation  sites  in  multiphosphorylated  proteins. 

Strategy 

The  strategy  used  for  the  localization  of  phosphorylation  sites  in  a  multiphos¬ 
phorylated  protein  is  outlined  in  figure  1 .  Osteopontin,  a  heavily  phosphorylated  gly¬ 
coprotein,  was  isolated  from  bovine  milk  (Sorensen  and  Petersen,  1993a).  The  protein 
was  digested  with  proteases  and  the  digests  separated  by  reverse-phase  HPLC.  Fractions 
containing  phosphoamino  acids  were  identified  by  a  amino  acid  analysis  and  further 
purified.  Aliquots  of  the  phosphopeptides  were  subjected  to  mass  spectrometric  analysis. 
Peptides  containing  phosphoserine  were  subjected  to  the  ethanethiol  treatment,  resulting 
in  the  conversion  of  phosphoserine  to  S-ethylcysteine,  and  subsequently  sequenced. 
The  S-ethylcysteine  derivatization  technique  is  discussed  on  the  basis  of  selected 
phosphopeptides  from  osteopontin. 

Modification  of  Phosphoserine  with  Ethanethiol 

The  S-ethylcysteine  derivatization  of  peptide-bound  phosphoserine  is  performed 
as  a  P-elimination  of  the  phosphate  group  followed  by  addition  of  ethanethiol  to  the 
double  bond  of  the  dehydroalanine  intermediate  (Figure  2).  The  experimental  procedure 
is  performed  essentially  as  described  (Meyer  et  al.,  1991).  The  dried  peptide  was  incubated 
for  1  h  at  50°C  under  nitrogen  with  50  pL  of  a  freshly  prepared  derivatization  mixture 
(80  pL  ethanol,  60  pL  ethanethiol,  65  pL  5M  NaOH,  400  pL  H2O).  The  sample  was  then 
cooled  to  room  temperature  and  neutralized  by  addition  of  10  pL  glacial  acetic  acid. 
Derivatized  samples  were  vacuum-dried  and  frozen.  Prior  to  sequencing  the  samples  were 
dissolved  with  50  pL  0.1%  trifluoroacetic  acid  and  applied  to  Biobrene-treated  glass  fiber 
filters. 


RESULTS  AND  DISCUSSION 

Amino  acid  analysis  of  the  peptide  LPVKPTSSGSSEEK,  corresponding  to  residues 
1-14  in  bovine  osteopontin,  showed  that  the  peptide  contained  phosphoserine.  The  phos- 
phoserine(s)  could  potentially  be  located  at  any  of  the  four  serines  in  the  peptide.  Yields  of 
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Figure  1.  Strategy  for  localization  of  phosphorylation  sites  in  multiphosphorylated  proteins. 


the  PTH-amino  acids  generated  by  sequencing  of  the  native  as  well  as  the  ethanethiol-treated 
peptide  are  shown  in  Table  1  (the  yields  in  the  two  experiments  are  not  comparable,  due  to 
different  amounts  of  sample  applied  to  the  sequencer). 

Sequencing  of  the  native  phosphopeptide  shows  that  the  yield  of  PTH-serine  is  zero 
or  very  low  in  all  cycles  predicted  to  contain  a  serine.  This  indicates  that  the  four  serines  in 
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SH 
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CH2 
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Figure  2.  S-ethylcysteine-derivatization  of  peptide-bound  phosphoserine.  Experimental  procedures  are  de¬ 
scribed  in  the  text. 
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Table  1.  Sequence  analysis  of  the  peptide  LPVKPTSSGSSEEK, 
corresponding  to  residues  1-14  in  bovine  osteopontin 


Amino  acid 
sequence® 

PTH-AA  yields 
native  peptide*’ 

PTH-AA  yields 
EtSH  peptide*’ 

L' 

419 

215(1) 

154 

81  (17) 

V' 

207 

100(20) 

87 

46  (7) 

p5 

59 

63  (5) 

'p6 

20 

11(4) 

S' 

3 

0(43)* 

0 

0  (58)* 

37 

28  (32) 

S)o 

6 

0(41)* 

S’* 

5 

0  (55)* 

E*2 

12 

9(30) 

e>3 

12 

15(17) 

KI4 

d 

d 

^Amino  acid  sequence  according  to  the  cDNA  (Kerr  et  al., 
1991). 

’^PTH-amino  acid  yields  obtained  by  sequencing  of  the 
native  underivatized  peptide. 

TTH-amino  acid  yields  obtained  by  sequencing  of  the 
ethanethiol-treated  peptide.  PTH-S-ethylcysteine  yields, 
given  in  parenthesis  in  each  sequence  step,  are  quantified 
by  use  of  the  response  factor  for  PTH-methionine. 
Sequence  cycles  in  which  PTH-S-ethylcysteine  were 
assigned  are  marked  by  asterisks. 

‘^The  yields  of  PTH-lysine  were  not  determined. 


the  peptide  are  modified,  but  direct  evidence  for  the  phosphorylation  is  missing.  Treatment 
of  the  phosphopeptide  with  ethanethiol  as  described,  followed  by  sequencing,  gave  PTH-S- 
ethylcysteine  at  all  four  serine  residues  (PTH-S-ethylcysteine  yields  are  shown  in  parenthe¬ 
sis),  thereby  showing  that  all  four  serines  in  this  peptide  are  actually  phosphorylated.  Mass 
spectrometric  analysis  of  the  underivatized  peptide  (data  not  shown)  revealed  an  excess  mass 
of  320.5  Da  compared  to  the  calculated  peptide  mass,  which  corresponds  to  four  phosphate 
groups.  In  this  example,  the  phosphorylations  could  have  been  assigned  simply  by  sequenc¬ 
ing  of  the  native  peptide  combined  with  the  mass  spectrometric  data,  but  this  is  not  the  case 
when  the  phosphorylation  pattern  becomes  more  complicated. 

By  amino  acid  analysis  the  peptide  TSQLTDHSKETNSSELSK,  corresponding  to 
residues  182-199  in  bovine  osteopontin,  was  shown  to  contain  phosphoamino  acids.  Mass 
spectrometric  analysis  of  the  peptide  showed  a  MH"^  ion  at  m/z  2 1 53.7  (Figure  3).  This  mass 
exceeds  the  calculated  peptide  mass  by  160.6  Da,  corresponding  to  two  phosphorylations. 
The  phosphopeptide  contains  five  serines  and  three  threonines,  constituting  a  total  of  eight 
potential  phosphorylation  sites.  In  this  case,  where  the  phosphopeptide  contains  both 
phosphorylated  and  unphosphorylated  hydroxy  amino  acids,  sequence  analysis  is  necessary 
for  the  localization  of  the  phosphorylations.  The  phosphopeptide  was  subjected  to  the 
ethanethiol  treatment  and  subsequently  sequenced  (Table  2).  Sequence  analysis  revealed 
PTH-S-ethylcysteine  in  cycles  corresponding  to  Seri 89  and  Serl94  in  the  native  protein. 
Examination  of  the  PTH-serine  and  PTH-S-ethylcysteine  yields  shows  that  the  conversion 
of  these  serines  was  essentially  complete  while  the  other  three  serines  in  the  sequence  were 
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Figures.  PDMS  spectra  of  the  peptide  TSQLTDHSKETNSSELSK,  corresponding  to  residues  182-199  in 
bovine  osteopontin. 


not  affected  by  the  treatment.  In  cases  like  this,  the  S-ethylcysteine  derivatization  is  essential 
for  the  localization  of  the  phosphorylation  sites. 

Limitations  of  the  Technique 

It  has  been  observed  that  certain  cysteines  gave  false  positive  results  for  S-ethylcys- 
teine  when  subjected  to  the  ethanethiol  treatment.  This  problem  has  been  solved  by  performic 
acid  oxidation  of  cysteines  to  cysteic  acid  before  subjecting  the  peptide  or  protein  to  the 
ethanethiol  treatment  (Meyer  et  al.,  1989). 

0-glycosylated  serine  residues  have  also  been  hypothesized  to  yield  S-ethylcysteine 
by  use  of  this  method,  but  in  combination  with  a  second  analysis  such  as  a  carbohydrate  or 
mass  spectrometric  analysis,  peptides  containing  0-glycosylations  can  easily  be  identified. 
Moreover,  a  selective  elimination  procedure  for  the  elimination  of  phosphate  groups  has 
been  described  (Byford,  1991). 

The  position  of  the  phosphoserine  in  the  peptide  subjected  to  the  ethanethiol  treat¬ 
ment  is  also  of  importance.  If  the  phosphoserine  residue  is  positioned  as  the  N-terminal 
amino  acid  in  the  phosphopeptide,  S-ethylcysteine  is  not  formed  during  the  ethanethiol 
treatment.  Instead,  elimination  of  the  phosphate  group  is  followed  by  rearrangement  of  the 
double  bond  in  the  dehydroalanine  intermediate,  resulting  in  pyruvate  formation  (Meyer  et 
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Table  2.  Sequence  analysis  of  the  peptide  TSQLTDHSKETNSSELSK, 
corresponding  to  residues  182-199  in  bovine  osteopontin 


Amino  acid 
sequence® 

PTH-AA  yields 

EtSH  peptide^ 

Amino  acid 
sequence® 

PTH-AA  yields 
EtSH  peptide^ 

j)82 

166(1) 

gl91 

33(15) 

Sl83 

63  (4) 

-pi  92 

26  (8) 

q]84 

75(1) 

N193 

16(5) 

L185 

146(1) 

Sl94 

2  (22)* 

-pi  86 

55(1) 

Sl95 

7(11) 

D187 

26(1) 

£196 

14(6) 

H188 

9(7) 

£197 

19(3) 

Sl89 

2  (53)* 

Sl98 

7(1) 

j-190 

52  (34) 

K199 

9(2) 

®Amino  acid  sequence  according  to  the  cDNA  (Kerr  et  al.,  1991). 

^PTH-amino  acid  yields  obtained  by  sequencing  of  the  ethanethiol-treated 
peptide.  PTH-S-ethylcysteine  yields,  given  in  parenthesis  in  each  sequence 
step,  are  quantified  by  use  of  the  response  factor  for  PTH-methionine. 
Sequence  cycles  in  which  PTH-S-ethylcysteine  were  assigned  are  marked  by 
asterisks. 


al.,  1991).  Likewise,  ethylamine  is  formed  during  the  elimination  process  ofphosphoserines 
located  as  the  C-terminal  amino  acid  in  the  phosphopeptide. 

In  our  work  with  osteopontin,  we  have  encountered  another  sequence  position  of 
phosphoserine  which  seems  to  hinder  the  formation  of  S-ethylcysteine,  namely  the  presence 
of  a  proline  residue  as  the  C-terminal  neighbour.  Mass  spectrometric  analysis  of  a  peptide, 
TLPSKSNESPEQ,  comprising  residues  57-68  in  bovine  osteopontin  showed  a  MH""  ion  at 
m/z  1559.3,  corresponding  to  the  calculated  peptide  mass  plus  three  phosphorylations 
(Figure  4).  The  peptide  was  subjected  to  the  ethanethiol  treatment  and  subsequently  se¬ 
quenced  (Table  3).  The  yield  of  PTH-threonine  in  cycle  one  is  at  an  expected  level,  thereby 
indicating  that  the  phosphorylations  must  be  located  at  the  three  serines  in  the  sequence.  As 
anticipated,  PTH-S-ethylcysteines  were  observed  at  positions  corresponding  to  Ser60  and 
Ser62.  However,  at  the  position  corresponding  to  Ser65  virtually  no  PTH-amino  acids  were 
identified.  Similar  to  this,  two  phosphoserines  in  K-casein  both  followed  by  prolines  failed 
to  yield  S-ethylcysteine  when  subjected  to  the  ethanethiol  treatment  (Rasmussen  et  al., 
unpublished  data).  These  results  strongly  indicate  that  addition  of  ethanethiol  to  the  double 


Table  3.  Sequence  analysis  of  the  peptide  TLPSKSNESPEQ, 
corresponding  to  residues  57-68  in  bovine  osteopontin 


Amino  acid 
sequence® 

PTH-AA  yields 

EtSH  peptide’’ 

Amino  acid 
sequence® 

PTH-AA  yields 
EtSH  peptide’’ 

T57 

146(1) 

47  (25) 

213  (0) 

£64 

38(6) 

p59 

100(1) 

S65 

5(4) 

S60 

2  (89)* 

p66 

40  (2) 

K61 

90(19) 

£67 

26(1) 

S62 

2(91)* 

q68 

29(1) 

^Amino  acid  sequence  according  to  the  cDNA  (Kerr  et  al.,  1991). 

*^PTH-amino  acid  yields  obtained  by  sequencing  of  the  ethanethiol-treated 
peptide.  PTH-S-ethylcysteine  yields,  given  in  parenthesis  in  each  sequence 
step,  are  quantified  by  use  of  the  response  factor  for  PTH-methionine. 
Sequence  cycles  in  which  PTH-S-ethylcysteine  were  assigned  are  marked  by 
asterisks. 
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M/Z 

Figure  4.  MALDI-TOF  spectra  of  the  peptide  TLPSKSNESPEQ,  corresponding  to  residues  57-68  in  bovine 
osteopontin. 


bond  of  the  dehydroalanine  intermediate  is  impaired  when  neighboured  by  a  proline  residue 
at  the  C-terminal  side. 

As  seen  in  Table  3,  the  position  of  a  proline  residue  as  the  N-terminal  neighbour  to 
a  phosphoserine  (Ser60),  does  not  affect  the  formation  of  S-ethylcysteine.  We  are  not  aware 
of  any  phosphoserine  located  in  a  -Ser(P)-Pro-  sequence  motif  that  has  been  identified  as 
S-ethylcysteine  after  ethanethiol  treatment. 

As  summarized  here,  the  technique  has  certain  limitations  but  these  can  easily  be 
managed  by  taking  the  appropriate  precautions  in  the  individual  cases,  or  simply  confirming 
phosphorylations  by  mass  spectrometric  analysis. 


CONCLUSION 

By  the  described  combination  of  amino  acid  analysis,  sequencing  of  S-ethylcyste- 
ine-derivatized  phosphopeptides  and  mass  spectrometric  analysis  we  have  identified  a  total 
of  27  phosphoserines  and  one  phosphothreonine  in  bovine  osteopontin  (Sorensen  and 
Petersen,  1994;  Sorensen  et  ah,  1995).  Osteopontin  contains  41  serines,  17  threonines  and  2 
tyrosines,  which  make  up  60  potential  phosphorylation  sites,  corresponding  to  23%  of  the 
amino  acids  in  the  protein.  A  considerable  number  of  these  serines  and  threonines  occur  in 
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series  of  4-5  potential  phosphorylations  sites.  This  clustering  of  potential  phosphorylation 
sites  makes  the  localization  of  phosphoamino  acids  by  the  subfragmentation  method  impos¬ 
sible.  Even  application  of  mass  spectrometric  analysis  of  series  of  extensive  digest  of  the 
protein  would  not  result  in  unambiguous  identification  of  the  phosphorylation  sites  in  this 
multiphosphorylated  protein. 

Likewise,  we  have  used  the  S-ethylcysteine  derivatization  method  to  identify  five 
phosphoserines  in  component  PP3  (Sorensen  and  Petersen,  1993b,c),  a  major  non-casein 
phosphoprotein  from  milk,  which  has  recently  been  shown  to  be  homologous  with  mouse 
and  rat  adhesion  molecule  GlyCAM-1  (Johnsen  et  ah,  1995).  In  PP3  the  five  phosphoryla¬ 
tions  were  all  located  within  a  stretch  of  1 8  amino  acids,  which  would  have  made  localization 
by  the  subfragmentation  method  very  difficult.  By  the  S-ethylcysteine  derivatization  proce¬ 
dure  all  five  phosphorylations  were  assigned  in  a  single  sequence  run. 

The  examples  discussed  here  emphasize  the  importance  of  both  mass  spectrometric 
and  sequence  data  in  the  exact  localization  of  phosphorylated  amino  acids. 
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INTRODUCTION 

Guanine  nucleotide  binding  proteins  or  G-proteins  function  as  molecular  switches  in 
a  diverse  set  of  signaling  pathways  by  coupling  seven-helix  transmembrane  receptors  to 
specific  intracellular  effectors  (Kaziro  et  al.,  1991;  Dohlman  et  al.,  1991).  G-proteins  are 
heterotrimers  composed  of  a-,  p-,  and  y-subunits.  Activation  of  the  appropiate  receptor 
causes  a  GDP  molecule  bound  to  the  resting  form  of  a  G-protein  to  be  exchanged  for  GTP. 
As  a  consequence,  the  G-protein  dissociates  to  form  the  a-subunit  complexed  to  GTP,  and 
the  py-dimer.  The  GTP-bound  conformation  of  the  a-subunit  is  capable  of  activating  or 
inhibiting  a  variety  of  downstream  effectors  including  enzymes  as  well  as  ion  channels 
(Birnbaumer,  1992;  Hepler  &  Gilman,  1992;  Simon  et  al.,  1991).  The  released  py-complex 
can  itself  activate  or  modulate  some  effectors  (Logothetis  et  al.,  1987;  Tang  et  al.,  1991 ;  Katz 
et  al.,  1992).  AGTPase-controlled  timing  mechanism  inherent  in  all  a-subunits  and,  in  some 
cases,  modulated  by  other  proteins  (Berstein  et  al.,  1992;  Arshavsky  &  Bownds,  1992), 
returns  the  GTP-activated  a-subunit  to  the  inactive  GDP-bound  conformation.  The  a- 
subunit  complexed  to  GDP  reassociates  with  the  py^complex  and  forms  again  the  hetero- 
trimer  in  its  resting  state.  Conklin  &  Bourne  (1993)  proposed  a  structural  model  for  a  general 
G-protein  a-subunit,  on  the  basis  of  biochemical,  immunologic,  and  molecular  genetic 
observations.  This  model  provided  a  blurred  but  revealing  view  of  the  orientation  of 
membrane-bound  G^  with  regard  to  Gp^,  receptors,  and  effectors. 

One  of  the  best  characterized  heterotrimeric  G-protein-coupled  systems  is  the  visual 
cascade  of  retinal  rod  outer  segments  (Lagnado  &  Baylor,  1992;  Hargrave  &  McDowell, 
1992).  Here,  the  visual  G-protein,  transducin,  serves  as  an  intermediary  between  the 
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photoreceptor  rhodopsin  (R),  and  the  effector  protein  cGMP  phosphodiesterase  (cGMP  PDE) 
during  signaling.  Visual  signal  transduction  begins  with  the  absorption  of  a  photon  by  the 
11-cis-retinal  chromophore  of  R.  Rapid  photoisomerization  to  all-trans-retinal  triggers  a 
series  of  structural  conformational  changes  that  lead  to  the  formation  of  the  activated 
intermediate  metarhodopsin  II  (R*).  R*  binds  the  heterotrimeric  GDP-bound  form  of 
transducin  (Tapy)  and  catalyses  the  exchange  of  GTP  for  GDP,  resulting  in  the  dissociation 
of  Totpy  into  To^-GTP  and  Tp^.  To^-GTP,  in  turn,  activates  a  potent  cGMP  PDE  by  binding  and 
displacing  its  inhibitory  subunits.  The  resulting  decrease  in  second  messenger  cGMP 
concentration  causes  cation-specific  cGMP-gated  channels  to  close,  leading  to  hyperpolari¬ 
zation  of  the  rod  cell  membrane  and  to  the  generation  of  the  nerve  impulse.  As  a  result  of  its 
intrinsic  GTPase  activity,  To  is  inactivated  by  hydrolysis  of  GTP  to  GDP,  returning  the  system 
to  its  resting  state. 

Recently,  the  three-dimensional  structures  of  a  325-amino  acid  fragment  of  T^^  bound 
to  either  GTP^S  (Noel  et  al,  1993),  or  to  GDP  (Lambright  et  al.,  1994)  have  been  solved. 
Together,  the  two  T„  structures  furnish  contrasting  freeze-frame  pictures  of  two  key  inter¬ 
mediates  in  the  G-protein  cycle.  Although  both  structures  are  quite  similar,  there  are 
differences  induced  by  nucleotide  exchange  that  are  localized  to  three  adjacent  regions  on 
one  face  of  the  protein,  which  have  been  implicated  in  effector  activation  (Lambright  et  al., 
1994).  However,  little  information  is  available  concerning  the  contacts  among  transducin 
subunits.  Furthermore,  it  is  not  known  which  residues  are  directly  involved  in  transducin- 
rhodopsin  and  transducin-cGMP  PDE  interactions.  Bubis  &  Khorana  (1990)  found  that 
Cys-25  of  Tp  is  in  close  proximity  to  Cys-36  and/or  Cys-37  of  T^,  by  using  cupric 
phenanthroline,  a  reagent  known  to  catalyse  the  formation  of  disulfide  bonds  between 
suitably  placed  sulfhydryl  groups.  To  continue  the  studies  on  the  structure-function  of 
transducin,  we  have  used  group-specific  labeling  and  chemical  cross-linking  techniques  to 
identify  some  of  the  functionally  important  amino  acid  residues  of  the  protein. 


EXPERIMENTAL  PROCEDURES 
Materials 

Bovine  eyes  were  obtained  from  the  nearest  slaughterhouse  (Matadero  Caracas,  C.A., 
Venezuela).  Retinae  were  extracted  in  the  dark,  under  red  light,  and  were  mantained  frozen 
at  -70°C.  Chemical  reagents  were  obtained  from  the  following  suppliers:  [2-^H]  iodoacetic 
acid  ([^H]  lAA,  131  mCi/mmol),  (3,  Y-imido-[^H]  guanosine  5 '-triphosphate  (pH]  GMP-PNP, 
12.8  Ci/mmol),  and  GTP  (30  Ci/mmol),  Amersham;  [8,5-3H]  GTP  (15  Ci/mmol), 

American  Radiolabeled  Chemicals  Inc.;  4-acetamido-4'-maleimidyl-stilbene-2,2'  disulfonic 
acid  (AMDA),  and  2,5-dimethoxystilbene-4'-maleimide  (DM),  Molecular  Probes,  Inc.; 
2-nitro  5-thiocyano  benzoic  acid  (NTCBA),  N,N'-dicyclohexyl-  carbodiimide  (DCCD), 
phenyl  isothiocyanate  (PITC),  o-phtalaldehyde  (OPA),  acetic  anhydride  (AC),  and  dansyl 
chloride  (DnsCl),  Sigma;  iodoacetic  acid  (lAA),  Kodak;  4-vinyl  pyridine  (VP),  Fluka; 
1 -ethyl  3 -(3 -dimethyl aminopropyl)  carbodiimide  (EDC),  and  citraconic  anhydride  (CA), 
Pierce;  N,N'-l,2-phenylenedimaleimide  (o-PDM),  and N,N'-l,4-phenylenedimaleimide  (p- 
PDM),  Aldrich.  All  other  reagents  were  analytical  grade. 

Rod  Outer  Segments  and  Washed  Membranes 

Rod  outer  segments  (ROS)  were  isolated  from  frozen  bovine  retinae,  by  flotation  and 
subsequent  centrifugation  on  discontinuos  sucrose  gradients,  as  described  previously  (Bubis 
&  Khorana,  1990;  Bubis  et  al.,  1993).  ROS  membranes  were  washed  with  2  mM  EDTA 
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(Baehr  et  aL,  1979),  or  5  M  urea  (Shichi  &  Somers,  1978),  to  remove  ROS  peripheric 
proteins.  Washed  ROS  were  used  as  the  source  of  rhodopsin  in  guanine  nucleotide  binding 
and  GTPase  assays. 

Transducin  Isolation 

Transducin  was  isolated  from  ROS  membranes  prepared  under  room  light,  at  4°C, 
following  the  affinity  binding  procedure  carried  out  by  Kiihn  (1980).  GTP  (100  pM)  was 
used  to  elute  transducin  from  the  washed  illuminated  ROS  membranes  (Baehr  et  al.,  1982), 
and  transducin  was  further  purified  to  homogeneity  by  ion-exchange  chromatography  on  DE 
52,  as  described  elsewhere  (Bubis  &  Khorana,  1990;  Bubis  et  al.,  1993). 

Isolation  of  and  T^y  by  Chromatography  in  Tandem  through  Blue 
Agarose  followed  by  co-Amino  Octyl  Agarose 

GTP-extracted  transducin  was  supplemented  with  100  pM  EDTAand  10%  glycerol 
and  chromatographed  on  a  blue  agarose  column  followed  by  an  co-amino  octylagarose 
column  (Bubis  &  Khorana,  1990).  The  a-subunit,  which  was  bound  to  the  blue  agarose 
column,  was  eluted  following  the  procedure  of  Shichi  et  al.  (1984).  The  Py-complex  was 
eluted  from  the  co-amino  octylagarose  as  described  by  Fung  (1983). 

Binding  of  [^H]  GMP-PNP  or  [^H]  GTP  to  Transducin 

Guanine  nucleotide  binding  to  native  or  reconstituted  transducin  was  measured  by 
Millipore  filtration,  as  described  previously  for  cyclic  nucleotide  binding  to  c AMP- depend¬ 
ent  protein  kinase  (Bubis  &  Taylor,  1985;  1987).  The  binding  reaction  was  carried  out  in 
Buffer  I  [10  mM  HEPES  (pH  7.4),  100  mM  NaCl,  5  mM  magnesium  acetate,  5  mM 
p-mercaptoethanol]  containing  0.1  pM  rhodopsin  (as  urea-washed  ROS  membranes),  and  a 
fixed  concentration  of  [^H]  GMP-PNP  or  [^H]  GTP  (0.2  pM). 

Modification  of  Transducin  Cysteyl,  Acidic  and  Lysyl  Residues 

For  cysteine  modification,  transducin  (1-2  pM)  was  incubated  either  with  DM  (5 
mM),  AMDA  (5  mM),  NTCBA  (5  mM),  [^H]  lAA  (4  mM),  or  VP  (74.9  mM),  in  20  mM 
Tris-HCl  (pH  8.0),  1  mM  dithiothreitol  (DTT),  1 00  mM  NaCl,  and  5  mM  magnesium  acetate. 
To  label  acidic  residues,  transducin  (0.2  pM)  was  incubated  with  5  mM  EDC  or  DCCD,  in 
50  mM  PIPES  (pH  6.2),  and  magnesium  acetate  (20  or  30  mM).  For  lysine  labeling, 
transducin  (0.2  pM)  was  incubated  with  either  5  mM  of  PITC,  OPA,  AA,  CA,  or  DnsCl,  in 
0.1  M  Tris-HCl  (pH  8,0).  At  designated  time  intervals  (0-60  min),  the  reaction  mixtures  for 
the  cysteines,  acidic  residue,  and  lysines  modifications  were  terminated  with  5  mM  DTT,  12 
mM  acetic  acid  (pH  6.3),  or  15  mM  p-mercaptoethanol  and  30  mM  magnesium  acetate, 
respectively.  Then,  the  kinetics  of  inactivation  of  transducin  was  assayed  measuring  the 
rhodopsin-dependent  [^H]  GMP-PNP  or  [^H]  GTP  binding.  Similar  procedures  were  used  to 
modify  T„  and  Tp^.  T^^  and  Tp^  (0.15  pM)  were  incubated  with  the  chemical  reagents,  and 
after  terminating  the  reactions,  they  were  reconstituted  with  0.15  pM  of  the  complementary 
unit,  to  reform  the  holoenzyme. 

Transducin  (0.2  -  0.4  mg)  also  was  modified  with  [^H]  lAA,  AMDA,  and  VP,  under 
native  conditions.  AMDA-modified  transducin  was  chromatographed  on  a  Sephadex  G-25 
gel  filtration  column.  The  fractions  containing  protein  were  pooled,  dialyzed  extensively 
against  50  mM  NH4HCO3  (pH  8.3),  and  concentrated.  lAA-  and  VP-modified  transducin 
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also  were  dialyzed  against  50  mM  NH4HCO3  (pH  8.3).  Labeled-transducin  samples  were 
then  digested  with  TPCK-treated  trypsin  in  a  molar  ratio  of  50: 1  protein  to  protease  for  24 
h,  at  37°C,  lyophilized,  and  the  resulting  peptides  were  separated  by  high  performance  liquid 
chromatography  (HPLC). 

Interaction  Assay  between  Modified  Transducin  and  Photoexcited 
Rhodopsin 

Transducin  was  incubated  either  with  5  mM  DM,  AMD  A,  NTCBA,  I A  A,  or  with  74.9 
mM  VP.  After  30  min  of  incubation,  at  4°C,  the  reactions  were  terminated  with  5  mM  DTT. 
These  samples  were  denominated  T(Cys-X),  (X  =  H  or  labeling  group).  EDTA-washed  ROS 
(3.5  \xM  rhodopsin)  were  mixed  with  T(Cys-X)  (2.4  pM),  in  an  isotonic  solution  [5  mM 
Tris-HCl  (pH  7.5),  0.1  M  NaCl,  5  mM  magnesium  acetate,  5  mM  p-mercaptoethanol],  and 
incubated  for  1  h,  at  4°C,  under  light.  To  assess  whether  a  functional  T-R*  complex  was  formed, 
succesive  cycles  of  centrifugation  (40,000  rpm,  30  min)  and  extraction  were  carried  out.  Initial 
supernatants  (S-ISO)  were  separated  and  the  pellets  were  resuspended  in  a  hypotonic  solution 
[5  mM  Tris-HCl  (pH  7.5),  5  mM  magnesium  acetate,  5  mM  p-mercaptoethanol]  (P^).  After 
centrifugation,  we  obtained  S-HYPO’s  and  the  resulting  pellets  were  extracted  in  the  hypotonic 
buffer  containing  150  pM  GTP  (P^-GTP).  P^-GTP’s  were  centrifuged  as  before,  and  the 
supernatants  (S-GTP)  were  separated  from  the  final  pellets  (Pf).  During  the  course  of  these 
experiments,  aliquots  were  taken  from  T(Cys-X),  S-ISO,  S-HYPO,  Pr-GTP,  S-GTP,  and  Pf,  for 
analysis  by  SDS-polyacrylamide  gel  electrphoresis  and  Western  blot. 

Assay  of  Transducin  Functionality  in  the  T-R*  Complex  Incubated  with 
Sulfhydryl  Group-Specific  Reagents 

Transducin  (1.4  pM)  was  incubated  with  EDTA-washed  ROS  (rhodopsin  =  5.6  pM) 
for  30  min,  at  4°C,  under  light.  The  mixture  was  centrifuged  at  40,000  rpm  for  30  min,  and 
T-R"'  was  obtained  in  the  sedimented  pellets.  The  pellets  were  resuspended  in  Buffer  II 
(hypotonic  buffer,  pH  8.0)  and  incubated  with  either  5  mM  DM,  AMD  A,  NTCBA,  lAA,  or 
74.9  mM  VP.  After  a  30  min  incubation  on  ice,  the  reactions  were  stopped  with  5  mM  DTT. 
These  samples  were  denominated  Pp(Cys-X),  (X  =  H  or  labeling  group).  Pp(Cys-X)’s  were 
centrifuged,  and  the  supernatants  were  designated  S-1.  The  resulting  pellets  were  resus¬ 
pended  in  Buffer  II  (Pp),  centrifuged  to  obtain  second  supernatants  (S-2).  Then,  the  pellets 
were  extracted  in  the  hypotonic  buffer  containing  150  pM  GTP  and  were  named  Pp-GTP. 
Pp-GTP’s  were  centrifuged  as  above,  and  the  supernatants  (S-GTP)  were  separated  from  the 
final  pellets  which  were  resuspended  in  Buffer  II  and  denominated  Pf.  During  the  course  of 
these  experiments,  aliquots  were  taken  from  Pr(Cys-X),  S-1,  Pp,  S-2,  S-GTP,  and  Pf,  for 
analysis  by  SDS-polyacrylamide  gel  electrphoresis  and  Western  blot. 

Cross-Linking  with  o-PDM  and  p-PDM 

Transducin  (0.2  pM)  was  incubated  with  2  mM  o-PDM  or  p-PDM  for  30  min,  at 
room  temperature,  in  100  mM  HEPES  (pH  8.0).  The  function  of  the  modified  enzyme  was 
assessed  determining  its  light-dependent  [^H]  GMP-PNP  binding  and  GTPase  activities. 
Both  transducin  functional  units  also  were  reacted  with  2  mM  o-PDM  or  p-PDM  in  a  similar 
fashion,  and  at  designated  time  intervals  (0-60  min)  two  aliquots  of  the  mixture  were 
removed  and  terminated  with  17  mM  DTT.  One  aliquot  was  analyzed  by  SDS-polyacry¬ 
lamide  gel  electrophoresis,  and  the  second  aliquot  was  assayed  for  [^H]  GMP-PNP  binding 
after  holoenzyme  reconstitution  with  the  complementary  native  functional  unit. 
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Spontaneous  Cross-Linking  of 

was  dialyzed  either  against  25  mM  sodium  phosphate  (pH  6.8)  or  50  mM  Tris-HCl 
(pH  8.0),  containing  5  mM  magnesium  acetate.  Parallel  experiments  were  performed  in  the 
same  buffers  containing  5  mM  p-mercaptoethanoL  was  then  incubated  with  to  reform 
the  holoenzyme,  and  the  light-dependent  GTP-hydrolytic  activity  of  the  reconstituted  protein 
was  determined  in  the  absence  or  presence  of  2  mM  DTT.  To  isolate  the  peptides  containing  the 
residues  involved  in  the  spontaneous  formation  of  disulfide  linkages  in  the  a-subunit,  (0.3 
mg)  was  dialyzed  at  pH  6.8,  modified  with  [^H]  lAA  after  treatment  with  DTT,  digested  with 
trypsin,  and  separated  by  HPLC.  A  similar  procedure  was  carried  out  in  the  absence  of  DTT. 

GTPase  Assay 

GTP  hydrolysis  assays  were  performed  as  described  by  Franke  et  al.  (1992),  using 
0.2  pM  of  native  or  reconstituted  transducin,  0.1  pM  rhodopsin  (as  EDTA-washed  ROS 
membranes),  and  20  pM  [^^P]  GTP  in  Buffer  I. 

Electrophoresis  on  Polyacrylamide  Gels  with  SDS 

Electrophoresis  on  polyacrylamide  slab  gels  (10%,  1.5  mm  thick)  was  carried  out  in 
the  presence  of  SDS  according  to  Laemmli  (1970). 

Western  Blot  Analyses 

Following  SDS-polyacrylamide  gel  electrophoresis,  the  proteins  were  electrotrans- 
ferred  to  nitrocellulose.  The  filters  were  processed  as  described  by  Towbin  et  al.  (1979)  using 
polyclonal  antibodies  directed  against  transducin,  raised  in  mice  (Bubis  et  al.,  1993).  These 
antibodies  recognize  T^^  very  specifically,  and  also  cross-react  with  rhodopsin. 

HPLC  Separations 

HPLC  was  carried  out  on  either  a  Hewlett-Packard  HP  1090  Liquid  Chromatograph 
instrument  or  a  Waters  625  LC  System  using  a  Merck  LiChrospher  100  RP-8  (5  pm)  column. 
The  solvents  employed  were:  10  mM  sodium  phosphate  (pH  6.8)  in  one  vessel  and 
acetonitrile  in  the  second  vessel.  Further  purification  of  peptides  modified  with  [^H]  lAA, 
AMDA  and  VP,  was  achieved  by  rechromatographing  the  peptides  using  a  Hibar  LiChrosorb 
RP- 1 8  (5  pm)  column  (0.40  x  25  cm)  and  a  different  buffer  system:  0. 1  %  trifluoroacetic  acid 
in  one  vessel  and  0.1%  trifluoroacetic  acid  in  acetonitrile  in  the  second  vessel.  Absorbance 
was  monitored  specifically  at  2 1 0  nm.  For  VP-modified  peptides,  we  used  a  photodiode  array 
detector  (Waters,  990  Series)  to  monitor  absorbance  between  200  and  300  nm.  For  AMDA- 
modified  peptides,  fluorescence  was  monitored  with  a  Bio-Rad  model  1700  flowthrough 
Fluorimeter  (excitation  source:  350  nm;  emission  filter  cut  off:  440  nm).  [^H]  lAA-modified 
peptides  were  identified  by  scintillation  counting. 

Sequencing 

Gas-phase  sequencing  was  carried  out  on  an  Applied  Biosystems  protein  sequenator. 
Phenylthiohydantoin  amino  acids  were  identified  by  HPLC,  as  described  by  Matsudaira 
(1987). 
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RESULTS 

Transducin  Labeling  with  Sulfhydryl  Group-Specific  Reagents 

The  role  of  transducin  sulfhydryl  groups  was  examined  by  chemical  modification 
with  five  reagents:  AMDA,  DM,  VP,  NTCBA,  and  lAA,  which  possess  different  sizes  and 
ionic  properties.  AMDA,  NTCBA  and  lAA  are  hydrophilic  compounds,  while  DM  and  VP 
are  hydrophobic.  Furthermore,  AMDA  and  DM  are  more  bulky  than  VP,  lAA  and  NTCBA. 
In  the  case  of  NTCBA,  a  very  small  group  (-CN)  will  be  incorporated  onto  the  reactive 
cysteine(s)  of  the  protein. 

All  these  reagents  inhibited  rhodopsin-dependent  guanine  nucleotide  binding  ac¬ 
tivity  of  transducin.  Figure  1  shows  the  kinetics  of  inactivation  of  transducin  [^H] 
GMP-PNP  binding  activity  by  modification  with  NTCBA.  Transducin  modification  with 
NTCBA  was  carried  out  in  the  presence  of  different  concentrations  of  p-mercaptoethanol 
( 1 .4, 2,  or  3  mM)  (Fig.  1 ).  The  reducing  agent  was  shown  to  protect  against  the  inactivation 
in  a  concentration  dependent  manner,  demonstrating  the  specificity  of  the  modification 
reaction.  Similar  results  were  obtained  with  the  other  four  reagents  (Data  not  shown). 
Figure  1  also  illustrates  that  the  solvent  (5%  DMF)  did  not  have  any  effect  on  transducin 
guanine  nucleotide  binding.  In  the  case  of  transducin  modification  with  lAA,  we  used 
[^H]  lAA  in  the  reaction  mixture  and  were  able  to  measure  the  stoichiometry  of  incor¬ 
poration  of  the  reagent  to  the  protein,  which  was  1 ,3  mol  of  [^H]  lAA  per  mol  of  transducin 
(Data  not  shown). 

Interaction  between  T(Cys-X)  and  Photoexcited  Rhodopsin 

Using  the  sedimentation  assay  described  under  Experimental  Procedures,  we  were 
able  to  determine  whether  modified  transducin  was  capable  of  interacting  either  with  R*  or 
GTP.  Sedimentation  experiments  followed  by  SDS-polyacrylamide  gel  electrophoresis  and 
Western  blot  analyses,  showed  that  modification  with  AMDA  or  VP  hindered  the  binding  of 
transducin  to  R*.  As  illustrated  in  figure  2,  VP-modified  transducin  lost  its  rhodopsin  binding 
capacity  bacause  it  was  completely  recovered  in  the  supernatant  after  the  first  centrifugation 
(S-ISO),  and  the  resuspended  pellets  (Pr-GTP  and  Pf)  only  showed  rhodopsin.  Similar  results 
were  observed  with  AMDA-modified  transducin  (Data  not  shown).  On  the  other  hand,  the 
modification  of  transducin  with  NTCBA  allowed  its  interaction  with  R*,  as  shown  in  figure 
3  by  SDS-polyacrylamide  gel  electrophoresis  (Panel  A)  and  Western  blot  (Panel  B).  NTCBA- 
modified  transducin  was  not  extracted  with  isotonic  (S-ISO)  or  hypotonic  washes  (S-HYPO), 
and  transducin  was  recovered  in  the  resuspended  pellet  obtained  after  both  of  these  washes 


Figure  1.  Inactivation  of  transducin  GMP-PNP 
binding  activity  by  modification  with  NTCBA. 
Transducin  was  incubated  with  5%  DMF  (A)  as  a 
control,  or  with  5  mM  NTCBA  in  the  presence  of 
either  1 .4  (A),  2  (▼),  or  3  mM  p-mercaptoethanol 
(V).  At  the  indicated  time  intervals,  the  reactions 
were  terminated  with  5  mM  DTT  and  assayed  for 
[^H]  GMP-PNP  binding. 
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Figure  2.  Interaction  of  photoexcited  rhodopsin  with  VP- 
modified  transducin.  Transducin  labeled  with  VP  was  incu¬ 
bated  with  illuminated  washed  ROS  membranes.  The 
supernatants  and  pellets  produced  by  the  sedimentation  as¬ 
say  described  under  Experimental  Procedures,  were  ana¬ 
lysed  by  SDS-polyacrylamide  gel  electrophoresis.  The 
abbreviations  used  in  the  figure  are  explained  in  the  text.  M 
=  molecular  weight  markers. 
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(Pr-GTP).  However,  this  interaction  was  maintained  even  in  the  presence  of  GTP;  no 
transducin  was  observed  in  S-GTP,  and  all  the  transducin  was  recovered  in  the  final  pellet 
(Pf).  DM-  and  lAA -modified  transducin  showed  identical  gel  and  Western  blot  patterns  as 
NTCBA-labeled  transducin  (Data  not  shown). 

The  results  observed  for  the  samples  of  transducin  modified  with  NTCBA,  lAA,  and 
DM,  could  also  be  explained  by  a  non-specific  precipitation  of  the  protein  due  to  denatura- 
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Figure  3.  Interaction  of  photoexcited  rhodopsin  with  NTCBA-modified  transducin.  Transducin  labeled  with 
NTCBA  was  incubated  with  illuminated  washed  ROS  membranes.  The  supernatants  and  pellets  produced  using 
the  sedimentation  assay  described  under  Experimental  Procedures,  were  analysed  by  SDS-polyacrilamide  gel 
electrophoresis  (Panel  A),  and  Western  blot  (Panel  B).  The  abbreviations  used  in  the  figure  are  explained  in 
the  text.  M  =  molecular  weight  markers. 
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Figure  4.  Transducin  functionality  in  transducin-illuminated  rhodopsin  complexes  treated  with  the  sulfhydryl 
group-specific  labels.  T:R’',  control;  (T:R*)vp,  modification  with  VP;  (T:R*)iaa.  modification  with  lAA; 
(T:R‘')ntcba.  modification  with  NTCBA.  Transducin-photoexcited  rhodopsin  complexes  were  incubated  with 
the  different  cysteine  modification  reagents.  The  supernatants  and  pellets  produced  by  the  sedimentation  assay 
described  under  Experimental  Procedures,  were  analysed  by  SDS-polyacrylamide  gel  electrophoresis.  The 
abbreviations  used  in  the  figure  are  explained  in  the  text.  M  =  molecular  weight  markers. 


tion.  To  examine  this  possibility,  parallel  experiments  were  performed  with  transducin 
incubated  with  NTCBA,  I A  A,  or  DM,  but  the  sedimentation  assays  were  carried  out  without 
washed  ROS  membranes.  We  observed  that  DM  precipitated  the  protein  (Data  not  shown). 
DM-modified  transducin  was  observed  in  the  sedimented  pellets  (Pr-GTP  and  Pf)  even  in  the 
absence  of  rhodopsin.  However,  NTCBA-  and  lAA-modified  transducin  behaved  as  unmodi- 
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fled  transducin  in  the  absence  of  washed  ROS  membranes.  These  results  proved  that 
NTCBA-  and  lAA-modified  transducin  specifically  interact  with  R*. 

Transducin  Functionality  in  Transducin-Illuminated  Rhodopsin 
Complexes  Treated  with  the  Sulfhydryl  Group-Specific  Labels 

T-R*  complexes  were  incubated  with  the  sulfhydryl  group- specific  reagents,  and  the 
dissociation  of  these  complexes  in  the  presence  of  GTP  was  evaluated.  As  shown  by 
SDS-polyacrylamide  gel  electrophoresis  in  figure  4,  T-R*  complexes  modified  with  NTCBA 
and  I A  A,  behaved  as  unmodified  T-R*,  allowing  the  dissociation  of  transducin  in  the 
presence  of  GTR  Transducin  was  recovered  in  the  supernatants  after  treatment  with  GTP 
(S-GTP).  These  results  indicate  that  rhodopsin  was  able  to  protect  against  the  inactivation 
previously  observed  for  NTCBA-  and  lAA-modified  transducin  (Fig.  1  and  3).  AMDA- 
treated  T-R*  showed  identical  results  as  lAA-  and  NTCBA-treated  T-R*  by  SDS-polyacry¬ 
lamide  gel  electrophoresis  (Data  not  shown).  On  the  other  hand,  VP-modified  T-R*  was  not 
able  to  interact  with  GTP.  Transducin  was  present  in  the  sedimented  pellets  (P^  and  Pf),  for 
VP-treated  T-R*  (Fig.  4).  As  DM  precipitated  transducin,  we  decided  to  exclude  DM  from 
this  set  of  experiments. 

HPLC  Separations  of  Tryptic  Peptides  from  IAA-,  AMDA-,  and 
VP-Labeled  Transducin  and  Identification  of  the  Cysteine  Residues 
Involved  in  the  Modification 

Transducin  was  modified  with  [^H]  lAA,  dialyzed  against  50  mM  NH4HCO3  (pH 
8.3),  digested  with  trypsin,  and  the  products  separated  by  HPLC,  as  described  under 
Experimental  Procedures,  The  HPLC  profile  showed  one  major  radioactive  peak  (Data  not 
shown).  This  peak  was  rechromatographed  using  a  different  gradient  system  (Fig.  5)  and  a 
unique  [^H]  containing  peptide  was  observed.  The  purified  radioactive  tryptic  peptide  was 
subjected  to  gas-phase  sequencing,  and  as  seen  in  figure  5,  it  corresponded  to  the  carboxy- 
terminal  tryptic  peptide  of  T^  (residues  342-350).  The  radioactivity  was  released  at  sequenc¬ 
ing  cycle  6  of  the  peptide.  This  result  showed  that  Cys-347  of  T^  is  the  residue  derivatized 
by  lAA  in  native  transducin. 


Figure  5.  Rechromatogram  of  the  major  [^H] 
lAA-labeled  tryptic  peptide  of  transducin.  The 
carboxymethylated  [^H]-peptide  was  rechro¬ 
matographed  on  a  Hibar  LiChrosorb  RP-18 
column.  The  buffers  employed  were:  A)  0.1% 
trifluoroacetic  acid  and  B)  0. 1%  trifluoroacetic 
in  CH3CN.  The  peptides  were  eluted  isocrati- 
cally  in  buffer  A  for  10  min,  and  then  with  a 
120-min  linear  gradient  from  0-40%  B.  Top 
Panel,  radioactivity.  Bottom  Panel,  absorbance 
at  210  nm.  The  purified  peptide  was  subjected 
to  gas-phase  sequencing,  and  the  sequence  ob¬ 
tained  is  shown  in  the  Top  Panel. 
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Figure  6.  HPLC  separation  of  trypsin  digest  of  AMDA-modified  transducin.  Transducin  was  incubated  with 
AMDA  and  digested  with  trypsin.  The  resulting  peptides  were  separated  by  HPLC  on  a  LiChrospher  1 00  RP-8 
column.  The  buffers  employed  were;  A)  1 0  mM  sodium  phosphate  (pH  6.8)  and  B)  CH3CN.  The  peptides  were 
eluted  with  a  180-min  linear  gradient  from  0-40%  B,  and  then  with  a  30-min  linear  gradient  from  40-60%  B. 
Panel  A,  absorbance  at  210  nm.  Panel  B,  fluorescence. 


time  (min) 

Figure  7.  HPLC  elution  profile  of  tryptic  peptides  of  VP-modified  transducin.  Transducin  labeled  with  VP 
was  digested  with  trypsin.  The  peptides  were  separated  by  HPLC  as  described  in  fig.  6,  and  their  elution  was 
monitored  at  210  (Panel  A),  254  (Panel  B),  and  280  nm  (Panel  C). 
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Transducin  also  was  modified  with  AMD  A  and  VP  on  a  preparative  scale,  and 
digested  as  described  for  lAA-labeled  transducin.  Figure  6  illustrates  the  HPLC  separation 
of  trypsin  digest  of  AMDA-modified  transducin.  The  fluorescent  profile  obtained  for  the 
sample  treated  with  AMDA  was  similar  to  the  radioactive  profile  obtained  for  pH]  lAA- 
modified  transducin.  A  major  fluorescent  peak  appeared  in  the  AMDA-treated  transducin 
(Fig.  6,  Panel  B).  On  the  other  hand,  the  HPLC  separation  of  the  resulting  peptides  from 
VP-modified  transducin  showed  the  existence  of  pyridylethyl  cysteine  in  at  least  four 
different  transducin  tryptic  peptides,  as  determined  by  their  absorbance  at  X:  254  nm  (Fig.  7, 
Panel  B).  We  also  monitored  the  absorbance  at  X:  280  nm,  to  discriminate  peptides  containing 
aromatic  amino  acid  residues  from  the  peptides  containing  the  derivatized  cysteines  (Fig.  7, 
Panel  C).  These  results  showed  clear  differences  in  labeling  of  transducin  sulfhydryl  groups 
with  AMDA  and  VP. 

Derivatization  of  Transducin  Carboxyl  Groups 

Chemical  modification  also  was  used  to  examine  transducin  functional  acidic  amino 
acids.  We  used  two  different  carbodiimides,  EDC,  a  water-soluble  compound  that  was  used 
to  determine  the  role  of  solvent-accesible  carboxyl  groups  in  the  protein,  and  DCCD,  a 
non-polar  carbodiimide  that  will  partition  into  the  hydrophobic  environments  of  proteins. 
As  seen  in  figure  8,  EDC  inhibited  the  [^H]  GTP  binding  activity  of  holotransducin  and  its 
isolated  subunits.  Transducin,  T^^,  and  showed  almost  complete  inhibition  (more  than 
80%)  of  their  light-dependent  guanine  nucleotide  binding,  in  the  presence  of  EDC.  In 
contrast,  the  holoenzyme  and  the  Py-complex  were  only  slightly  affected  by  treatment  with 
DCCD.  We  observed  only  10%  inhibition  of  their  activity  (less  in  the  case  of  Tp^),  when 
incubated  with  DCCD  (Fig.  8).  DCCD-modified  T^  showed  40%  inactivation,  more  than 
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Figure  8.  Effect  of  EDC  and  DCCD  treatment  of  transducin,  T^,  and  Tpy  on  the  [^H]  GTP  binding  activity  of 
native  and  reconstituted  holoenzyme.  The  bottom  Panel  shows  the  reagents  and  buffers  used  for  each 
experiment.  MgAc  =  magnesium  acetate. 
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Figure  9.  Time  course  of  the  modification  of  transducin,  1^,  and  Tp^  with  EDC  and  DCCD.  Control  experi¬ 
ments  contained  either  8%  acetonitrile  or  50  mM  Pipes  and  30  mM  magnesium  acetate,  for  DCCD  and  EDC 
labeling  experiments,  respectively.  Control  (•);  Transducin  (□);  T^  (♦);  Tp.^  (A). 


four  times  the  inhibition  observed  for  transducin  and  treated  with  DCCD.  Similar  results 
were  obtained  whether  the  experiments  were  performed  in  the  presence  of  20  or  30  mM 
magnesium  acetate.  The  kinetics  of  the  modification  of  transducin,  and  with  EDC 
and  DCCD,  and  the  effect  on  the  guanine  nucleotide  binding  activity  of  the  protein,  are 
shown  in  figure  9. 

Derivatization  of  Transducin  Lysyl  Residues 

Transducin  functional  lysines  were  examined  by  chemical  modification  with  five 
different  reagents:  PITC,  OPA,  AA,  CA,  and  DnsCl.  As  illustrated  in  figure  10,  with  the 
exception  of  PITC,  all  lysine  modification  reagents  produced  the  functional  inactivation  of 
transducin  (more  than  60%  inhibition  in  GTP  binding  activity).  Incubation  of  T^^  or  Tp^  with 
PITC  or  OPA,  also  resulted  in  more  than  60%  inactivation  of  the  reconstituted  holoenzyme. 
On  the  other  hand;  AA,  CA,  and  DnsCl  caused  inactivation  of  the  reconstituted  enzyme  with 
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Figure  10.  Effect  on  the  [^H]  GTP  binding  activity  of  native  and  reconstituted  holoenzyme  after  treatment  of 
transducin,  T„,  and  Tp^  with  lysyl  group-specific  reagents.  The  bottom  Panel  shows  the  reagents  and  buffers 
used  for  each  experiment.  ME  =  p-mercaptoethanol.  Mg  Ac  =  magnesium  acetate. 


modified  but  not  with  modified  Tp^.  Figure  11  shows  the  time  course  of  inactivation  of 
transducin  by  OPA  (Panel  A),  AA  (Panel  B),  CA  (Panel  C),  and  DnsCl  (Panel  D),  with  DnsCl 
displaying  the  fastest  inhibition. 

Cross-Linking  of  Transducin  by  Sulfhydryl  Group-Specific  Bifunctional 
Reagents 

Two  sulfhydryl  group-specific  bifunctional  labels:  o-PDM  and  p-PDM,  were  used  as 
cross-linking  agents  for  transducin  and  transducin  subunits.  As  seen  in  figure  12  (Panel  A), 
incubation  of  with  o-PDM  or  p-PDM  resulted  in  the  formation  of  high  molecular  weight 
oligomers,  as  well  as  bands  that  migrated  with  apparent  molecular  masses  of  37  and  35  kDa. 
Incubation  of  Tp^  with  both  reagents  produced  a  new  major  species,  46  KDa,  which  resulted 
from  the  cross-linking  between  Tp  and  (Fig.  12,  Panel  B).  Transducin  modified  with 
o-PDM  or  p-PDM  showed  a  complete  inactivation  of  its  [^H]  GMP-PNP  binding  and  GTPase 
activities  (Data  not  shown).  We  carried  out  holoenzyme  reconstitution  experiments  combin¬ 
ing  modified  with  native  transducin  functional  units,  to  discriminate  which  unit  was 
responsible  for  the  observed  inhibition  in  transducin  function.  The  combination  of  intact 
T^  and  o-PDM-treated  Tp^  reconstituted  native  transducin  GMP-PNP  binding  activity, 
indicating  that  the  formation  of  the  46  KDa  species  did  not  affect  the  function  of  the 
reconstituted  protein.  However,  o-PDM-modified  T^^  incubated  with  intact  Tp^,  exhibited 
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CA;  Panel  D,  modification  with  DnsCl.  Control  experiments  contained  8%  acetonitrile. 
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Figure  12.  Time  course  of  the 
cross-linking  of  T„  and  with 
o-PDM  and  p-PDM.  T„  and  Tp^ 
were  incubated  with  2  mM  o-PDM 
and  p-PDM.  At  the  indicated  time 
intervals,  an  aliquot  of  the  reaction 
mixture  was  terminated  with  17 
mM  DTT  and  analyzed  by  SDS- 
polyacrylamide  gel  electrophore¬ 
sis.  Panel  A,  T^;  Panel  B,  Tp^.  M  = 
molecular  weight  markers. 


more  than  90%  inhibition  in  transducin  native  enzymatic  function  (Data  not  shown).  These 
results  implied  that  the  formation  of  intra  and/or  intermolecular  cross-links  in  were 
responsible  for  the  observed  inactivation.  Similar  results  were  obtained  when  the  individual 
functional  units  of  transducin  were  treated  with  p-PDM  (Data  not  shown). 

Spontaneous  Formation  of  Disulfide  Bonds  in 

Spontaneous  oxidation  of  sulfhydryl  groups,  and  formation  of  disulfide  bonds  in 
were  examined  at  different  pH  values  and  under  non-reducing  conditions.  Dialysis  of 
T„  at  pH  8.0,  followed  by  reconstitution  with  native  Tp^,  produced  the  total  inactivation  of 
its  GTPase  activity  (Fig.  13).  However,  native  GTP-hydrolytic  activity  was  obtained  after 
incubation  of  the  reconstituted  holoenzyme  with  2  mM  DTT  (Fig.  13).  Dialysis  of  T^^  at  pH 
6.8,  followed  by  reconstitution  with  native  Tp^,  produced  only  a  reduction  in  GTPase  activity 
of  about  30-40%  (Fig  14),  and  as  before,  the  addition  of  2  mM  DTT  restored  native  transducin 
GTPase  functionality.  As  seen  in  figure  14,  T^  dialyzed  in  the  presence  of  5  mM  p-mercap- 
toethanol  and  reconstituted  with  intact  Py-complex,  exhibited  native  GTP-hydrolytic  capac¬ 
ity,  in  the  presence  or  absence  of  2  mM  DTT. 
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Figure  13.  GTPase  activity  of  dialyzed  at  pH 
8.0  and  reconstituted  with  Tp^.  Shown  is  the 
light-dependent  GTP-hydrolytic  activity  of  the 
reconstituted  enzyme  in  the  absence  (■)  or  pres¬ 
ence  (□)  of  2  mM  DTT. 


In  order  to  localize  the  cysteines  involved  in  the  spontaneous  formation  of  disulfide 
linkages,  was  dialyzed  at  pH  6.8,  and  two  aliquots  containing  0.3  mg  of  protein  were 
taken,  DTT  (2  mM)  was  added  to  one  of  the  samples.  Then,  both  samples  were  incubated 
with  [^H]  lAA,  digested  with  trypsin,  and  the  products  separated  by  HPLC.  Figure  15  shows 
the  resulting  absorbance  and  radioactivity  HPLC  profiles.  One  peptide  was  alkylated  by  lAA 
in  both  cases  (peptide  II,  Panel  A  and  C).  However,  the  sample  containing  DTT  showed  three 
new  radioactively  labeled  peptides  (peptides  I,  III,  and  IV).  Gas-phase  sequencing  of 
peptides  Ila  and  lie  (Fig.  16)  identified  Cys-135  of  T^,,  as  the  amino  acid  residue  car- 
boxymethylated  by  lAA  in  the  presence  or  absence  of  DTT.  Sequencing  of  peptides  I  and 
III  showed  that  both  contained  Cys-347  of  T^,  being  peptide  III  an  incomplete  proteolytic 


Figure  14.  GTPase  activity  of  dialyzed  at  pH  6.8  and  reconstituted  with  Tp^.  was  dialyzed  in  25  mM 
sodium  phosphate  (NaPi),  5  mM  magnesium  acetate  (Mg-acetate),  pH  6.8  (Right  Panel),  or  in  the  same  buffer 
containing  2  mM  p-mercaptoethanol  (2-MSH)  (Left  Panel).  Shown  is  the  light-dependent  GTP-hydrolytic 
activity  measured  in  the  absence  (■,  ♦)  or  presence  (□,  0)  of  2  mM  DTT,  after  reconstitution  with  the 
py-complex. 
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Figure  15.  HPLC  separation  of  tryptic  peptides  from  Tq  dialyzed  at  pH  6.8,  and  modified  with  [^H]  lAA.  The 
peptides  were  separated  by  HPLC  as  described  in  Fig.  6.  Panel  A  and  B,  radioactivity  and  absorbance  at  210 
nm,  respectively,  of  Td  dialyzed  in  the  absence  of  DTT.  Panel  C  and  D,  radioactivity  and  absorbance  at  210 
nm,  respectively,  of  T„  dialyzed  in  the  presence  of  DTT.  The  major  radioactive  tryptic  peptides  were  designated 
with  roman  numerals. 


product  of  peptide  L  These  results  identified  Cys-347  of  as  one  of  the  cysteines  involved 
in  the  spontaneous  formation  of  disulfide  cross-links  in  the  a-subunit  of  transducin.  Se¬ 
quence  analysis  of  peptide  IV  identified  Cys-321  of  as  the  carboxymethylated  residue. 
This  cysteine  also  may  be  participating  in  T^^  disulfide  formation,  however  the  amount  of 
[^H]  lAA  incorporated  onto  peptide  IV  is  lower  than  the  level  of  alkylation  observed  for 
peptides  I  and  III  (Fig.  1 5).  As  seen  in  figure  15,  there  are  other  minor  [^H]-containing  tryptic 
peptides  in  T^^  dialyzed  in  the  absence  of  DTT. 


DISCUSSION 

Five  different  sulfhydryl  group-specific  labels  were  used  for  the  chemical  modifica¬ 
tion  of  transducin,  and  in  all  cases  the  protein  was  completely  inactivated.  The  kinetics  of 
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Figure  16.  Sequencing  of  carboxymethylated  tryptic  peptides  from  Tq  dialyzed  at  pH  6.8,  in  the  presence  or 
absence  of  DTT. 


modification  of  transducin  cysteines  with  AMD  A,  DM,  VP,  NTCBA  and  lAA  showed  that 
the  loss  in  GMP-PNP  binding  activity  occurred  inmediately  after  the  reaction. 

With  the  exception  of  DM  which  caused  the  precipitation  of  transducin,  the  other 
sulfhydryl  group-specific  reagents  were  used  to  examine  whether  R*  was  capable  to  interact 
in  a  productive  manner  with  labeled  transducin.  The  modification  of  transducin  with  AMDA 
or  VP  hindered  the  binding  of  transducin  to  R\  The  inactivation  in  guanine  nucleotide 
binding  activity  observed  for  AMDA-  and  VP-transducin  must  be  attributed  to  this  blockade. 
In  contrast,  NTCBA-  and  lAA-modified  transducin  behaved  as  untreated  transducin,  and 
were  capable  of  interacting  with  illuminated  rhodopsin.  These  results  can  be  explained  by 
the  small  size  of  the  cysteine  alkylating  groups,  -CN  and  carboxymethyl  groups,  from 
NTCBA  and  lAA,  respectively.  However,  GTP  was  not  able  to  dissociate  the  complex 
between  NTCBA-  or  lAA-modified  transducin  and  R*.  Probably,  transducin  GDP/GTP 
exchange  reaction,  induced  by  illuminated  rhodopsin,  was  sterically  prevented  as  a  result  of 
transducin  modification  with  either  NTCBA  or  lAA.  Then,  with  NTCBA-  and  lAA-labeled 
transducin,  we  were  able  to  stabilize  and  freeze  a  transducin-rhodopsin  complex  intermedi¬ 
ate,  which  could  not  dissociate  even  after  incubation  with  GTP. 

AMDA,  NTCBA  and  lAAare  similar  in  nature,  the  three  compounds  are  charged  and 
polar  at  our  working  pH.  Due  to  their  hydrophilic  characteristics,  these  reagents  are  capable 
of  modifying  cysteine(s)  localized  on  the  external  surface  of  the  protein.  In  particular, 
sulfhydryl  groups  located  in  close  proximity  to  the  receptor  binding  site  of  transducin 
constitute  excellent  targets  for  these  compounds.  The  three  reagents  also  may  modify  the 
same  amino  acid  residue(s)  in  transducin.  Nevertheless,  a  different  behaviour  was  observed 
between  AMDA-  and  NTCBA-  or  lAA-modified  transducin  in  the  sedimentation  assays  with 
R*.  As  explained  above,  the  cyanide  and  carboxymethyl  groups  derived  from  NTCBA  and 
I A  A,  will  allow  transducin  interaction  with  R*,  due  to  their  small  volume.  However,  the 
labeling  group  incorporated  onto  transducin  after  modification  with  AMDA  is  very  bulky 
and  will  produce  steric  hindrance  preventing  rhodopsin  binding. 

Chemical  modifications  performed  on  T-R”"  complexes  were  used  to  explore  the 
ability  of  rhodopsin  to  protect  against  the  inactivation  observed  in  transducin  function. 
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Rhodopsin  protected  when  AMDA,  NTCBA  and  lAA  were  used  as  labeling  reagents,  but 
not  when  VP  was  employed.  The  more  hydrophilic  compounds,  AMDA,  NTCBA  and  lAA 
did  not  affect  transducin  function  when  the  T-R*  complex  was  formed  previous  to  the 
modification  reactions.  However,  the  non-polar  reagent  VP,  inactivated  guanine  nucleotide 
binding  by  transducin  even  in  the  presence  of  photoexcited  rhodopsin.  These  results  will 
favor  the  hypothesis  that  AMDA,  NTCBA  and  lAA  are  modifying  the  same  cysteine(s)  in 
transducin. 

Transducin  carboxymethylated  with  [^H]  lAA  under  native  conditions  showed  an 
incorporation  of  approximately  one  mole  of  the  compound  per  mole  of  protein.  HPLC 
separation  of  [^H]  lAA-labeled  tryptic  peptides  of  transducin  also  revealed  that  a  unique 
peptide  was  alkylated.  The  labeled  amino  acid  residue  corresponded  to  Cys-347  of  T^^.  Then, 
the  modification  of  this  cysteine  must  be  responsible  for  the  complete  inhibition  observed 
in  transducin  function  after  lAA  alkylation.  HPLC  separation  of  AMDA-labeled  tryptic 
peptides  of  transducin  also  showed  a  major  modified  peptide,  as  with  lAA.  Although  the 
cysteines  modified  by  AMDA  and  NTCBA  were  not  mapped  in  this  study,  Cys-347  of  T^ 
represents  a  good  candidate,  due  to  the  similar  polar  properties  shared  by  AMDA,  NTCBA, 
and  lAA.  In  contrast,  several  VP-labeled  tryptic  peptides  (at  least  four)  were  identified  in 
the  HPLC  profile  of  VP-modified  transducin  proteolized  with  trypsin.  Even  if  Cys-347  of 
T^^  is  one  of  the  residues  modified  by  VP,  it  is  clear  that  this  compound  is  labeling  more  than 
one  sulfhydryl  group  in  transducin.  As  discussed  above,  R*  did  not  protect  against  transducin 
inactivation  by  VP.  Then,  some  of  the  cysteines  modified  with  the  pyridylethyl  group  in 
transducin  may  be  located  in  regions  different  than  the  receptor  binding  site;  for  example, 
in  the  protein  guanine  nucleotide  binding  pocket.  VP  also  maybe  labeling  cysteine(s)  located 
in  regions  that  sterically  will  hinder  the  GDP/GTP  exchange  induced  by  rhodopsin,  prevent¬ 
ing  either  the  exit  of  GDP  or  the  entrance  of  GTP.  These  results  proved  that  transducin 
sulfhydryl  groups  are  differentially  labeled  depending  on  the  hydrophilicity  or  hydropho- 
bicity  of  the  reagent. 

Ho  &  Fung  (1984)  examined  the  role  of  the  sulfhydryl  groups  of  transducin  by  5, 
5'-dithiobis-(2-nitrobenzoic  acid)  (DTNB)  titration  and  N-ethylmaleimide  (NEM)  modifi¬ 
cation.  They,  as  well  as  Reichert  &  Hofmann  (1984)  showed  that  derivatization  of  a  reactive 
sulfhydryl  by  NEM  or  DTNB  inhibited  rhodopsin-catalised  GDP/GTP  exchange.  In  addition, 
Hofmann  &  Reichert  (1985)  showed  that  replacement  of  the  thionitrobenzoate  derivative, 
with  the  less  bulky  cyanide,  reversed  the  hindrance  of  transducin  binding  to  photoexcited 
rhodopsin,  similar  to  our  observations  when  NTCBA  was  used  as  the  alkylating  reagent. 
Neither  the  studies  of  Ho  &  Fung  (1984)  nor  those  of  Reichert  &  Hofmann  (1984)  and 
Hofmann  &  Reichert  (1985)  mapped  the  reactive  cysteine  derivatized  by  NEM  or  DTNB  in 
the  primary  sequence  of  transducin. 

Dhanasekaran  et  al.  (1988)  used  [^^^I]  N-(3-iodo-4-azidophenyl  propionamido)-S- 
(2-thiopyridyl)  cysteine  ([*^^1]  ACTP)  to  derivatize  reduced  sulfhydryls  of  transducin,  and 
showed  the  incorporation  of  1-1.3  mol  of  the  compound  into  T^.  They  found  that  both 
Cys-347  and  Cys-2 10  in  T^^  were  derivatized  by  [’^^I]  ACTP  in  a  ratio  of  approximately  70 
and  30%,  respectively.  The  modification  of  these  two  reactive  cysteines  inhibited  rhodop¬ 
sin-catalised  activation  of  transducin.  Van  Dop  et  al.  (1984)  showed  that  the  ADP-ribosyla- 
tion  of  transducin  by  pertussis  toxin  blocked  the  light- stimulated  hydrolysis  of  GTP. 
Subsequently,  West  et  al.  (1985)  identified  Cys-347  as  the  site  of  pertussis  toxin-catalysed 
ADP-ribosylation  in  transducin.  These  studies  strengthen  our  results  that  identify  Cys-347 
of  Tct  as  one  of  the  functionally  important  residues  in  the  protein. 

Cys-347  is  the  fourth  residue  upstream  from  the  carboxyl  terminus  of  T„  (Phe  350). 
Various  reports  (Conklin  et  al.,  1993;  Weingarten  et  al.,  1990)  have  shown  that  the  carboxyl 
terminal  region  of  several  a  subunits  are  particularly  important  for  interaction  with  the 
receptor.  Hamm  et  al.  (1988)  showed  that  a  synthetic  peptide  corresponding  to  two  regions 
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near  the  carboxyl  terminus  of  Glu^**-VaP^^  and  (which  contained  Cys-347), 

competed  with  transducin  for  interaction  with  rhodopsin.  Furthermore  the  1 1  -amino  acid 
peptide  from  the  COOH  terminus  of  T„  (Ile^'^^-Phe^^^)  mimics  transducin  effects  in  stabiliz¬ 
ing  the  rhodopsin  active  form,  metarhodopsin  11.  Dratz  et  al.  (1993)  studied  the  structure  of 
the  interface  between  excited  and  unexcited  rhodopsin  and  the  carboxyl  terminus  peptide  of 
(Ile^'^^-Phe^^^),  using  NMR.  They  observed  conformational  differences  between  the  two 
bound  forms  and  suggested  a  mechanism  for  activation  of  G  proteins  by  agonist-stimulated 
receptors.  Among  the  changes,  the  Cys  side  chain  of  residue  347  appears  to  pivot  from 
pointing  outside  the  dark  rhodopsin-bound  peptide  to  being  tucked  inwards  in  the  metarho¬ 
dopsin  II  bound  structure  (Dratz,  et  al.,  1993).  Then,  the  insertion  of  alkylating  groups,  or 
an  ADP-ribose  group  by  pertussis  toxin,  at  Cys-347  would  be  predicted  to  change  the 
conformation  of  the  COOH  terminus  of  T^^,  and  may  affect  transducin-rhodopsin  interac¬ 
tions. 

Chemical  modification  also  was  used  to  examine  transducin  functional  acidic  amino 
acids.  EDC,  hydrophilic  in  nature,  completely  abolished  the  light-dependent  [^H]  GTP 
binding  activity  of  transducin.  When,  T^^  or  Tp^  were  individually  treated  with  EDC,  and  then 
incubated  with  the  native  complementary  unit,  the  GTP  binding  activity  of  the  reconstituted 
holoenzyme  also  was  inhibited.  Binding  to  rhodopsin  probably  is  hindered  in  EDC-modified 
native  and  reconstituted  transducin,  as  EDC  will  target  accesible  carboxyl  groups  located  on 
the  surface  of  the  protein.  EDC-labeling  of  Asp(s)  and/or  Glu(s)  located  on  the  individual 
transducin  units  also  may  be  preventing  the  reconstitution  of  the  holoenzyme.  In  contrast, 
transducin  and  the  py-complex  were  only  slightly  affected  by  treatment  with  the  hydrophobic 
carbodiimide,  DCCD.  DCCD-modified  T^  showed  40%  inactivation,  more  than  quadruple 
the  amount  of  inhibition  observed  for  transducin  and  Tp^,  treated  both  with  DCCD.  We 
believe  that  the  Asp  and/or  Glu  residues  located  in  or  near  the  metal  and  nucleotide 
interaction  sites  in  transducin,  are  the  best  targets  for  DCCD-labeling.  Guanine  nucleotide 
binding  sites  contain  a  hydrophobic  pocket  that  constitutes  the  primary  recognition  site  for 
the  guanine  ring.  Since  Mg'^^-GDP  is  strongly  bound  to  purified  transducin,  the  metal-nu¬ 
cleotide  complex  may  protect  against  DCCD  inactivation.  On  the  other  hand,  is  purified 
free  of  nucleotide  from  the  blue  agarose  column  (Shichi  et  al.,  1984).  In  this  case,  the  acidic 
residues  involved  in  the  GTP  binding  pocket  will  be  more  susceptible  to  DCCD-labeling. 
However,  T^^  was  always  stored  and  maintained  in  buffers  containing  Mg"^^,  and  the  metal 
may  be  protecting  some  against  the  modification  by  DCCD.  Furthermore,  the  labeling  of 
with  DCCD  also  may  be  affecting  its  interaction  with  the  py-complex,  and  viceversa, 
hindering  the  formation  of  the  native  holoenzyme. 

There  are  several  possible  reaction  pathways  for  the  interaction  of  carbodiimides  with 
carboxyl  groups  on  proteins.  The  reaction  of  a  carbodiimide  with  an  acidic  residue  may 
produce  a  stable  N-acylurea  adduct,  after  the  initial  formation  of  an  0-acylurea  intermediate. 
Alternatively,  the  O-acylurea  that  is  formed  may  interact  with  a  nucleophile,  for  example 
water.  If,  however,  the  nucleophile  is  a  nearby  amino  group  of  an  amino  acid  side  chain,  an 
inter  or  intramolecular  “zero-length”  cross-link  may  be  formed  (Toner- Webb  &  Taylor, 
1987).  As  described  above,  the  COOH  terminus  peptide  of  T^  have  been  shown  to  be 
important  for  interaction  with  the  receptor.  The  NMR  structure  of  rhodopsin  and  metharho- 
dopsin  bound  to  the  1 1 -amino  acid  peptide  from  the  C-terminus  of  T^  (Ile^'^^-Phe^^^)  showed 
that  residue  345,  which  is  a  lysine,  is  capable  of  forming  different  salt  bridges  with  carboxyl 
groups  from  the  same  peptide,  depending  on  the  state  of  the  receptor  (Dratz  et  al,  1993). 
Lys-345  which  is  close  to  Glu-342  in  the  dark  rhodopsin-bound  peptide,  appears  to  be  closer 
to  the  free  carboxyl  at  the  C-terminus  in  the  activated  receptor-bound  peptide.  The  hydro- 
phylic  carbodiimide  EDC  may  be  cross-linking  Glu-342  with  Lys  345  in  T^^.  The  formation 
of  this  covalent  cross-link  will  impede  the  breakage  and  replacement  of  salt  bridges  involving 
Lys  345.  EDC  also  may  be  modifying  either  Glu-342  or  the  C-terminal  carboxyl  group, 
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hindering  the  light-induced  transducin  conformational  change  mediated  by  interaction  with 
activated  rhodopsin.  The  crystal  structures  of  a  325 -amino  acid  fragment  of  bound  either 
to  GDP  (Lambright  et  al.,  1994)  or  to  GTP^S  (Noel  et  al.,  1993)  have  been  solved.  A  view 
of  both  three-dimensional  structures  of  showed  that  several  acidic  residues  were  involved 
directly  either  in  the  coordination  of  Mg"^^,  in  guanine  nucleotide  binding,  or  in  the 
mechanism  for  GTP  hydrolysis.  These  amino  acids  were:  Glu-39,  Asp-146,  Asp- 196, 
Glu-203,  and  Asp-268.  The  labeling  by  DCCD  of  any  of  these  target  residues  will  result  in 
the  inactivation  observed  in  DCCD-modified  T^^. 

Five  different  modification  reagents  were  used  to  evaluate  the  existence  of  functional 
lysines  in  transducin.  One  of  the  most  interesting  compounds  was  PITC,  which  did  not  affect 
the  guanine  nucleotide  binding  properties  of  the  modified  holoenzyme.  Hingorani  &  Ho 
(1987)  studied  the  effect  of  fluorescein  5 '-isothiocyanate  labeling  on  transducin  function, 
and  they  found  no  effect  on  the  transducin-rhodopsin  interaction  or  on  the  binding  of 
GMP-PNP  in  the  presence  of  R*,  similar  to  our  observations  with  PITC.  However,  When 
or  Tpy  was  incubated  with  PITC,  we  observed  more  than  60%  inhibition  in  the  function 
of  the  reconstituted  protein.  These  results  suggested  that  PITC  was  modifying  Lys  residues 
located  in  or  near  the  region  of  intersubunit  contact,  hindering  the  interactions  among 
transducin  subunits  and  impeding  the  formation  of  the  holoenzyme.  The  other  Lys  modifi¬ 
cation  compounds  produced  inactivation  of  the  native  modified  transducin.  Incubation  of 
T(j  or  Tpy  with  OPA,  also  resulted  in  more  than  60%  inactivation  in  the  function  of  the 
reconstituted  holoenzyme.  On  the  other  hand;  AA,  CA,  and  DnsCl  caused  inactivation  of 
the  reconstituted  enzyme  with  modified  T^^,  but  not  with  modified  Tp^.  For  OPA,  A  A,  CA, 
and  DnsCl,  we  were  not  able  to  discriminate  the  functionality(ies)  affected  in  transducin: 
the  recognition  site  for  photoexcited  rhodopsin,  the  guanine  nucleotide  binding  pocket,  or 
the  site  of  interaction  between  the  a-subunit  and  the  py-heterodimer.  Furthermore,  these 
labels  also  may  be  modifying  Lys  located  in  regions  that  sterically  hinder  either  the  exit  or 
the  entrance  of  guanine  nucleotides,  producing  an  indirect  inhibitory  effect. 

Lys-345  of  T„  is  an  excellent  candidate  for  modification  by  the  amino  group-specific 
reagents  used  in  this  work.  As  described  above,  Lys-345  is  located  in  the  C-terminal  region 
of  and  the  interaction  of  transducin  with  photoexcited  rhodopsin  mediates  conforma¬ 
tional  changes  in  the  position  of  this  residue.  This  is  also  true  for  Lys  341  which  is  located 
in  the  surroundings  of  Lys  345.  In  particular,  OPA  also  may  cross-link  suitably  placed 
8-amino  groups  from  lysine  side  chains,  with  sulfhydryls  from  cysteine  residues.  Lys  345 
and  Cys  347  are  neighbouring  functional  amino  acids  that  may  be  cross-linked  in  OPA-modi- 
fied  transducin  and/or  T^.  The  three-dimensional  view  of  T^^  guanine  nucleotide  binding 
pocket,  also  showed  the  direct  involvement  of  lysyl  residues  in  the  binding  site.  The  key 
lysines  were:  Lys-42,  and  Lys  266.  Modification  of  either  of  these  lysine  residues  in 
transducin  and/or  T^^  will  produce  the  functional  inactivation  of  the  protein. 

Although  the  crystal  structures  of  T^^  bound  to  GTP^S,  and  bound  to  GDP  are  virtually 
identical  for  86%  of  the  positions  examined  in  both  (Lambright  et  al.,  1994),  there  are 
changes  in  a  small  surface  area  located  on  one  face  of  T^^.  The  structural  differences  are 
localized  to  three  adjacent  regions  referred  as  switch  I  (Ser  173-Thr  183),  switch  II  (Phe 
195-Thr  21 5),  and  switch  III  (Asp  227-Arg  238).  These  three  regions  contain  residues  of  the 
type  we  characterized  in  this  manuscript:  Lys- 176,  and  Glu-182  in  switch  I;  Asp- 196, 
Glu-203,  Lys-205,  Lys-206,  Cys-210,  and  Glu-212  in  switch  II;  Asp-227,  Glu-232,  Asp-233, 
Asp-234,  and  Glu-235  in  switch  III.  The  chemical  modification  of  any  of  these  amino  acids 
probably  will  hinder  the  conformational  change  induced  in  T^^  by  nucleotide  exchange,  and 
will  cause  the  inactivation  of  transducin. 

Cross-linking  studies  with  o-PDM  and  p-PDM  demonstrated  the  formation  of  intra 
and  intermolecular  species  in  T^^,  which  were  responsible  for  the  inactivation  observed  in 
the  function  of  the  reconstituted  holotransducin.  Formation  of  a  46  KDa  cross-linking  species 
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between  Tp  and  T^,  when  the  py-heterodimer  was  incubated  with  o-PDM  or  p-PDM,  did  not 
affect  the  function  of  the  reassociated  enzyme.  Hingorani  et  al.  (1988)  reported  the  formation 
of  similar  cross-linking  products  in  transducin,  when  p-PDM  was  used.  In  their  case,  the 
cross-linked  products  were  identified  by  Western  immunob lotting  using  antisera  against 
purified  subunits  of  transducin  (!„  and  Tp^).  However,  the  studies  by  Hingorani  et  al.  (1988) 
failed  to  measure  the  effect  of  p-PDM  chemical  cross-linking  on  transducin  function.  Bubis 
&  Khorana  (1990)  have  shown  that  cupric  phenanthroline  catalises  a  single  interchain 
disulfide  bond  formation  between  the  P-  and  y-subunits  of  transducin.  The  same  disulfide 
bond  was  formed  when  holotransducin  or  the  complex  of  py-subunits  were  treated  with  the 
reagent.  The  residues  participating  in  the  disulfide  bond  were  identified  as  Cys-25  in  Tp  and 
Cys-36  and/or  Cys-37  in  T^.  Cupric  phenanthroline  induced  the  formation  of  a  new  species 
with  an  apparent  molecular  mass  of  43  KDa  (Bubis  and  Khorana,  1 990),  very  similar  to  the 
46  Kda  cross-link  observed  in  o-PDM-  and  p-PDM-treated  Tp^.  These  results  suggest  that 
the  same  cysteine  residues  of  transducin  p-  and  y-subunits  may  be  involved  in  o-PDM-  and 
p-PDM-cross-linked  Tp^. 

We  also  studied  the  spontaneous  formation  of  disulfide  bonds  in  T^^.  Sequence 
analysis  of  the  radioactive  peptides  obtained  by  tryptic  digestion  of  [^H]  lAA-modified  T^, 
identified  Cys-347  as  one  of  the  cysteines  involved  in  the  cross-links.  The  formation  of 
disulfide  linkages  in  transducin  a-subunit  inhibited  the  light-dependent  GTPase  activity  of 
the  reconstituted  holoenzyme.  Finding  that  Cys-347  of  T^^  participated  in  the  formation  of 
these  disulfide  bonds,  demonstrated  again  the  important  role  of  this  residue  in  the  function 
of  transducin.  Wessling-Resnick  &  Johnson  (1989)  also  reported  the  formation  of  intermo- 
lecular  disulfide  linkages  between  the  a-subunits  of  transducin  molecules  when  the  purified 
protein  was  placed  in  a  non-reducing  buffer  system.  The  specific  oligomeric  association  of 
a-subunits  provides  a  physical  basis  for  the  cooperative  activation  kinetics  described  for  the 
rhodopsin  transduction  system. 

The  work  reported  here  identifies  different  functionally  important  cysteines,  lysines, 
and  acidic  residues  in  transducin  that  are  located  either  in  the  domains  of  intersubunit  contact, 
in  the  proximity  of  the  interaction  site  with  rhodopsin,  or  near  the  guanine  nucleotide  binding 
pocket. 
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ABSTRACT 

X-ray  photoelectron  spectroscopy  (XPS)  is  a  surface  sensitive  analytical  technique 
which  measures  the  binding  energy  of  electrons  in  atoms  and  molecules.  The  binding  energy 
can  be  related  to  the  molecular  bonding  or  oxidation  state  of  an  element  in  the  outermost 
layer  of  a  material,  that  is  <  100  A.  Thus,  XPS  is  able  to  identify  chemical  species  present 
on  the  surface  of  a  molecule.  In  this  paper  XPS  is  briefly  described.  Spectra  demonstrating 
its  potential  use  for  probing  the  surface  properties  of  amino  acids,  polypeptides,  proteins, 
carbohydrates  and  glycoproteins  are  discussed. 


INTRODUCTION 

XPS  has  also  been  referred  to  as  electron  spectroscopy  for  chemical  analysis  (ESCA). 
The  basis  for  XPS  is  the  photoelectric  effect  (1).  Irradiation  of  a  material  with  monochro¬ 
matic  x-rays  results  in  the  expulsion  of  photoelectrons  from  electron  orbitals  (e.g.,  s-orbitals) 
of  the  sample.  The  energy  of  an  incident  x-ray  is  transformed  into  the  kinetic  energy  of  a 
photo-emitted  electron  (Figure  1).  By  measuring  the  kinetic  energy  (E^)  of  the  ejected 
photoelectron  and  known  x-ray  photon  energy  (hv),  the  binding  energy  (E^)  of  that  electron 
can  be  deduced  using  the  following  equation: 

Eb  =  hv  -  Ek  -  w 

where  w  is  the  experimentally  determined  work  function  of  the  spectrometer. 
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Figure  1.  Schematic  representation  of  the  XPS  process. 


XPS  generates  its  information  from  two  modes  of  analyses:  low  resolution,  or  survey, 
spectra  and  high  resolution  spectra.  Qualitative  information  is  normally  obtained  and  atomic 
composition  can  be  obtained  from  a  survey  spectrum  of  the  sample  surface.  Detailed 
chemical  bonding  information  (e.g.,  oxidation  state)  is  acquired  from  high  resolution  scans 
on  each  element. 

The  binding  energy  of  electrons  in  an  element  is  unique  to  the  element  as  well  as 
unique  to  its  chemical  environment.  can  change  several  eV  due  to  changes  in  oxidation 
state.  When  examining  inner-shell  electrons,  the  binding  energies  of  these  electrons  in  any 
element,  X,  can  be  directly  related  to  the  oxidation  state  of  that  element  in  a  molecule  in  the 
following  progression:  X’  <  X^“  <  X^  <  X^+  <  X"^;  i.e.,  the  larger  the  positive  charge  on  the 
element,  the  greater  the  affinity  of  the  nucleus  for  the  remaining  electrons,  and  hence,  the 
larger  the  E^. 

XPS  has  routinely  been  used  to  examine  the  chemical  structure  of  various  organic 
and  inorganic  materials  (2).  In  an  article  on  the  microencapsulation  of  an  explosive, 
which  contains  the  elements  of  carbon,  nitrogen  and  oxygen  (similar  to  the  biological 
compounds  to  be  discussed),  equations  were  written  that  allowed  for  the  determination 
of  the  thickness  of  the  coating  and  the  mechanism  of  polymer  bonding  to  the  explosive 

(3). 

XPS  has  not  been  extensively  used  in  the  study  of  biological  systems.  A  few  XPS 
papers  have  been  published  which  examined  the  role  of  metals  in  biological  systems. 
Chiu  et  al.  (4)  studied  the  bonding  of  oxygen  to  selenium  in  a  glutathione  peroxidase 
model  system,  Meisenheimer  et  al.  (5)  determined  monovalent  cation  compositions  in 
erythrocyte  membranes,  and  Pickart  et  al.  (6)  studied  Ca^'*’  flux  in  hepatoma  cells  during 
DNA  synthesis.  In  the  latter  study,  an  intramembrane  Ca^'''  gradient  was  established  with 
the  highest  levels  of  Ca^"^  being  at  the  cytoplasmic  side  and  not  towards  the  extracellular 
space.  A  few  studies  have  been  published  that  have  made  use  of  XPS  for  the  charac¬ 
terization  of  surfaces  of  bacterial  cells  (7),  and  for  estimating  the  protein  content  in  seeds 
(8)- 

Only  a  few  XPS  studies  have  been  performed  to  study  in  detail  the  surface  chemistries 
of  biological  macromolecules  such  as  proteins  (9-11).  These  papers  showed  the  zwitterionic 
nitrogen  to  be  in  a  less  positive  state  following  amide  formation.  In  this  paper,  low  and  high 
resolution  XPS  spectra  for  several  amino  acids  related  to  the  core  protein  of  a  glycoprotein 
are  reported.  These  results  show  the  potential  usefulness  of  this  technique  in  characterizing 
carbohydrate  coatings  on  polypeptides  and  proteins. 


EXPERIMENTATION 

Samples  are  applied  to  a  25  mm^  Au  metal  surface  as  a  multilayer  in  one  of 
the  following  methods:  1)  as  a  powder,  2)  as  a  solution  in  water  (doubly  distilled), 
3)  as  a  slurry  in  methanol,  or  4)  as  a  slurry  in  a  mixture  of  water  and  methanol.  In 
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the  latter  cases,  the  solvents  were  allowed  to  evaporate  before  being  placed  in  the 
analysis  chamber. 

The  instrument  used  in  this  study  was  a  Kratos  AXIS  spectrometer  which  uses 
a  hemispherical  electrostatic  analyzer  to  determine  electron  kinetic  energies.  Following 
sample  introduction,  the  chamber  was  evacuated  to  <  10’^  torn  The  x-ray  source  was 
an  aluminum  monochromator  that  emits  an  x-ray  beam  of  1486.67  Pa.  The  x-ray 
power  was  300  watts  (20  mA  and  15  kV).  During  the  photoelectron  process,  the 
surface  acquires  a  positive  charge.  In  order  to  minimize  sample  decomposition  and 
charging,  the  sample  was  bathed  in  low  energy  electrons  of  less  than  1  eV  by  a  charge 
neutralizer.  In  this  work,  copious  amounts  of  electrons  were  generated  by  the  neutralizer 
which  were  sufficient  to  minimize  differential  sample  charging.  Samples  were  irradiated 
until  sufficient  data  were  collected.  The  amount  of  x-ray  degradation  previously  de¬ 
termined  on  glycine  was  found  to  be  <  1%  over  two  hours  of  irradiation.  No  detectable 
x-ray  damage  was  noted  on  the  samples  analyzed  in  this  paper.  Data  collection  required 
about  90  min  per  sample. 

XPS  spectra  were  deconvoluted  to  a  best  fit  using  peak  shapes  of  70  %  Gaussian  and 
30  %  Lawrencian  character  to  account  for  tailing  toward  the  high  Eb  side.  Atomic  % 
compositions  of  each  elemental  species  present  were  calculated  by  dividing  the  area  under 
each  XPS  elemental  peak  by  an  instrumental  sensitivity  factor.  The  sensitivity  factors  were 
theoretically  calculated  (12)  from  photoionization  cross-sectional  data.  It  is  very  difficult  for 
the  XPS  technique  to  quantify  hydrogen  since  the  H  Is  electron  is  part  of  the  valence  level. 
These  valence  electrons  are  often  associated  with,  or  shared  between,  two  or  more  elements 
and  thus  can  not  be  easily  used  for  elemental  quantification.  Therefore,  hydrogen  is  not 
included  in  the  atomic  %  determinations. 


RESULTS  AND  DISCUSSION 

As  discussed  earlier,  XPS  data  are  often  accumulated  in  two  ways:  either  low 
resolution  or  high  resolution.  In  the  first  case,  the  electron  binding  energies  of  a  protein 
or  peptide  are  measured  to  approximately  ±  1  eV.  From  this  determination,  qualitative 
and  semi-quantitative  information  about  the  protein  surface  is  acquired.  That  is,  the 
elemental  constituents  are  determined  and  the  atomic  %  composition  is  calculated.  In  the 
second  case,  are  determined  to  within  ±  0.1  eV.  In  our  work,  E^  are  determined  to  < 
0.1  eV.  From  these  measurements,  the  electron  distributions  about  each  atom  (or  the 
charge  distributions)  in  the  molecule  are  determined,  and  oxidation  state  and  chemical 
bonding  information  are  inferred. 

Low  Resolution  Spectra 

Figure  2  illustrates  a  survey  scan  for  the  simple  amino  acid  glycine.  The  spectrum 
shows  the  three  major  constituents  found  in  most  amino  acids:  C  at  285  eV,  N  at  401  eV  and 
O  at  531  eV.  After  correcting  for  differences  in  sensitivity  factors,  the  area  under  each 
photoelectron  peak  is  proportional  to  the  atomic  concentration.  For  this  amino  acid,  the 
experimentally  determined  composition  (in  atomic  %)  from  the  survey  scan  is  42  %  C,  20 
%  N  and  38%  O.  These  values  are  in  good  agreement  with  the  theoretically  calculated  atomic 
compositions  of  40%  C,  20%  N  and  40%  O. 

The  atomic  concentrations  obtained  from  the  XPS  spectra  of  the  nine  different  amino 
acids  found  in  the  human  mucin  tandem  repeat  sequence  MUC 1  ( 1 2, 1 3)  are  given  in  Table  1 . 
MUCl  is  a  20  amino  acid  polypeptide  with  the  sequence  GSTAPPAHGVTSAPDTRPAP.  An 
amino  acid  mixture  with  the  composition  of  G2S2T3A4P5HVDR  corresponding  to  the  MUCl 
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Figure  2.  Representative  low  resolution  XPS  spectrum  of  glycine  showing  the  presence  of  carbon,  nitrogen 
and  oxygen  characteristic  of  amino  acids. 


peptide  was  prepared  and  analyzed  by  XPS.  Again,  good  agreement  between  experimental 
and  theoretical  values  was  noted.  These  amino  acids  have  a  range  of  carbon  from  43%  for 
Gly  to  64%  for  Val.  The  nitrogen  composition  is  much  lower:  from  12%  for  Asp  to  32%  for 
Arg.  The  usual  range  for  %  N  in  proteins  is  about  1 5  to  20%.  The  %  atomic  composition  of 
oxygen  varies  much  like  that  of  nitrogen.  Ranges  for  this  element  are  from  1 6%  in  Arg  to 
41%  in  Ser. 


Table  1.  Summary  of  quantitative  XPS  results  on 
mucin-related  materials 


Material 

Atomic  %  Compositions 

%C 

%N 

%o 

Ala 

52’  (50)2 

17(17) 

32(33) 

Arg 

53  (50) 

32  (36) 

16(17) 

Asp 

46  (44) 

12(11) 

40  (44) 

Gly 

43  (40) 

21  (20) 

36  (40) 

His 

52  (50) 

17(17) 

32  (33) 

Pro 

63  (62) 

17(16) 

30  (31) 

Ser 

45  (43) 

14(14) 

41  (43) 

Thr 

53  (50) 

13(12) 

37  (38) 

Val 

64  (62) 

13(12) 

23  (31) 

MUCl  peptide  ^  52  (53) 

17(16) 

30  (31) 

^Values  given  are  the  experimentally  determined  atomic  % 
composition  for  each  element  in  the  corresponding  amino  acid. 

“Values  in  parentheses  are  the  theoretically  derived  atomic  % 
compositions  for  each  element  in  the  corresponding  amino  acid. 

^Amino  acids  present  in  the  MUC 1  peptide  were  combined  in  the 
molar  ratios  of  G2S2T3A4P5HVDR. 
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Table  2.  Summary  of  quantitative  XPS  results  on  mucins  and  related  materials 


Atomic  %  compositions' 

Material 

%C 

%N 

%0 

Carbohydrate,  normal  ^ 

55  (50) 

0.3  (0.4) 

45  (49) 

Carbohydrate,  cancer  ^ 

55  (52) 

3.4  (3.6) 

42  (44) 

Porcine  mucin 

77 

4.9 

18 

Porcine  mucin,  partially  deglycosylated  ^ 

65 

8.5 

27 

Bovine  mucin 

72 

3.2 

25 

Bovine  mucin,  partially  deglycosylated  ^ 

61 

9.7 

28 

’Values  given  are  the  experimentally  determined  atomic  %  composition  for  each 
element  in  the  corresponding  amino  acid.  Values  in  parentheses  are  the 
theoretically  derived  atomic  %  compositions  for  each  element  in  the 
corresponding  amino  acid. 

^Carbohydrates  common  to  normal  mucins  (13,  14)  were  combined  in  the  molar 
ratios  of  Gali2GlcNAcnGalNAci  where  Gal  =  galactose,  GlcNAc  = 
N-acetylglucosamine  and  GalNAc  =  N-acetylgalactosamine. 

^Carbohydrate  common  to  cancer-associated  mucins  (13)  were  combined  in  the 
molar  ratios  of  NANA i  Gal  i  GalNAc i,  where  NANA  =  N-acetylneuraminic  acid. 

'^Mucins  (from  porcine  stomach  and  bovine  submaxillary  gland)  were  purchased 
from  Sigma  Chemical  Co  (St.  Louis,  MO). 

^Mucins  were  partially  deglycosylated  by  periodate  oxidation  (15). 


The  full  range  of  sensitivity  of  this  technique  for  identifying  specific  amino  acid 
substitutions  in  simple  polypeptides  and  proteins  is  not  fully  established.  Using  a  human 
mucin  MUCl  tandem  repeat  peptide,  we  have  preliminary  evidence  suggesting  that  XPS  is 
capable  of  easily  identifying  a  point  mutation  in  a  mutant  peptide  containing  the 
mutation.  This  mutant  peptide  represents  a  20%  decrease  in  the  number  of  hydroxyl  groups 
present  on  this  peptide. 

A  survey  of  carbohydrate  structures  found  in  human  mucins  (13,  14)  is  beginning  to 
reveal  some  similarities  and  differences  between  protein  and  carbohydrate  in  an  XPS 
spectrum  (Table  2).  Carbohydrates  are  not  too  dissimilar  from  amino  acids  in  carbon  content 
('^  55  %  C  in  carbohydrates  as  compared  to  ~  50%  in  amino  acids).  The  %  O  is  slightly  higher 
in  carbohydrates  (~  45-49  %  O)  than  in  protein  (~  35  %  O).  However,  the  major  difference 
between  amino  acids  and  carbohydrates  is  their  nitrogen  content.  In  human  mucins,  an 
average  of  about  0.3  atomic  %  nitrogen  can  be  observed  in  samples  representing  normal 
mucin  oligosaccharide  side  chains  to  3  atomic  %  N  in  the  oligosaccharide  side  chains  of 
breast  cancer-associated  mucin  (13,  14). 

The  atomic  %  of  nitrogen  in  carbohydrates  are  clearly  distinguishable  from  the 
nitrogen  composition  of  protein.  For  example,  fully  glycosylated  porcine  mucin  showed 
a  composition  of  77  %  C,  4.9  %  N  and  18  %  O  (Table  2).  The  measured  composition  of 
periodate-oxidized  porcine  mucin  (Table  2)  showed  a  decrease  in  the  atomic  %  of  carbon 
and  an  increase  in  the  atomic  %  of  both  nitrogen  and  oxygen.  This  indicates  that  the 
mucin  core  protein  is  being  exposed  by  the  removal  of  carbohydrate  during  periodate 
oxidation.  Since  the  core  protein  of  a  related  porcine  mucin  has  a  theoretical  composition 
of  51%  C,  16  %  N  and  33  %  0(16),  it  can  be  concluded  that  the  periodate  treatment  did 
not  fully  deglycosylate  this  mucin  and  that  it  still  bears  a  high  degree  of  oligosaccharide 
side  chains  branched  on  the  GalNAc-O-Thr  which  are  not  susceptible  to  periodate 
oxidation  (15).  Similar  to  the  porcine  mucin,  bovine  mucin  showed  an  increase  in  atomic 
%  N  content  from  3.2%  to  9.7%  after  periodate  treatment.  Again,  indicating  that  bovine 
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Table  3.  Summary  of  binding  energies  for  glycine,  polyglycine  and  glucose: 
High  resolution  data 


Eb,  eV 

Glycine  (12  runs) 

Polyglycine  (8  runs) 

Glucose  (4  runs) 

C  Is 

a-C 

286.19  ±0.07 

286.42  ±  0.07 

carboxylate 

288.38  ±  0.07 

amide 

288.42  ±  0.07 

alcohol 

286.60  ±  0.03 

anomeric 

287.94  ±0.05 

N  Is 

zwitterion 

401.45  +  0.05 

amide 

400.19  ±0.05 

O  Is 

carboxylate 

531.15  ±0.06 

amide 

531.78  ±0.06 

alcohol 

532.88  ±  0.04 

pyranose  ring 

533.62  ±  0.07 

Eb  given  are  the  average  energies  of  the  indicated  number  of  runs  ±  the  standard  deviation. 


mucin  is  not  fully  susceptible  to  periodate  oxidation  to  remove  oligosaccharide  side 
chains. 

High  Resolution  Spectra 

Figures  3  through  5  illustrate  the  XPS  high  resolution  C  Is,  N  Is  and  0  1s  spectra, 
respectively,  of  one  amino  acid  (glycine),  one  polypeptide  (polyglycine  with  average  Mr  ~ 
4,500)  and  one  carbohydrate  (glucose).  The  analyses  of  these  data  are  given  in  Table  3.  As 
can  be  seen  from  the  data  (also  see  Fig.  3a),  glycine  has  two  types  of  carbon  atoms:  one 
methylene  and  one  carboxylate.  (A  small  third  carbon  peak  is  also  found  to  be  present.  Almost 
all  materials  contain  a  carboneous  contaminant  that  is  sometimes  referred  to  as  ubiquitous, 
or  residual,  carbon.  The  source  is  often  not  readily  identifiable.  The  C  1  s  electrons  from  this 
carbon  source  is  often  found  at  an  Ei,  of  -  285  eV  and  has  not  been  assigned  to  any  specific 
carbon  species  of  the  amino  acid.)  The  higher  binding  energy  peak  at  288.38  ±  0.07  eV 
corresponds  to  the  carbon  atom  of  the  carboxylate  group  with  the  methylene  carbon 
appearing  at  a  lower  binding  energy  of  286.  19  ±  0.07  eV.  This  is  consistent  with  the 
molecular  bonding  and  the  electron  charge  distribution  where  the  carbon  of  the  carboxylate 
is  at  a  higher  positive  oxidation  state  than  the  methylene  carbon.  Also,  the  intensities  of  the 
peaks  are  in  about  a  1:1  ratio,  consistent  with  their  abundance  in  this  amino  acid.  Fig.  3B 
illustrates  the  C  1  s  data  from  polyglycine.  Again,  two  peaks  are  seen,  one  of  which  represents 
the  a-carbon  at  286. 19  ±  0.07  eV  and  the  other  which  represents  the  amide  carbon  at  288.42 
±  0.07  eV.  The  glucose  C  Is  spectrum  (Fig.  3c)  is  completely  different  from  either  of  the 
above  two  glycine  compounds.  This  carbohydrate  contains  two  carbon  atoms  in  different 
chemical  environments:  one  at  286.60  ±  0.03  eV  characteristic  of  an  alcohol  and  the  other 
at  287.94  ±  0.05  eV,  characteristic  of  the  anomeric  carbon.  Thus,  the  E^’s  are  distinguishable 
between  the  different  oxidation  states  of  each  carbon  species  (i.e.,  a  carboxylate  of  a 
zwitterion  is  identifiable  from  an  anomeric  carbon,  or  an  alcohol  or  an  aliphatic  carbon).  The 
binding  energy  data  reported  above  is  referenced  to  the  C  Is  level  of  the  two  methyl  groups 
on  Leu  which  was  set  at  285.00  eV. 
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Figure  3.  High  resolution  XPS  C  Is  spectra  of  (a)  glycine,  (b)  poly  glycine  and  (c)  glucose  showing  unique 
patterns  for  the  different  carbon  atoms  in  each  compound.  Jagged  curves  are  the  raw  data  and  the  smooth  curves 
are  the  best  fit  approximations  as  described  in  the  text. 


Fig.  4  exhibits  the  high  resolution  N  Is  spectra  of  glycine,  polyglycine  and  glucose. 
A  single  N  Is  peak  is  observed  in  both  the  glycine  (Fig.  4a)  and  poly  glycine  (Fig.  4b)  spectra, 
but  no  nitrogen  is  observed  in  the  glucose  spectrum  (Fig.  4c).  The  N  Is  peaks  appear  at 
different  positions  in  the  first  two  spectra:  the  nitrogen  atoms  in  the  zwitterionic  form  of 
glycine  are  observed  at  Eb  =  401.45  ±  0.05  eV  which  is  characteristic  of  nitrogen  in  a  +1 
oxidation  state  and  the  amide  nitrogen  at  Eb  =  400.19  ±  0.05  eV  where  nitrogen  has  a  lone 
pair  of  electrons.  These  nitrogen  results  are  similar  to  those  reported  previously  (8-10). 

Glycine  has  two  oxygen  atoms.  However,  these  electrons  are  equivalent  due  to 
the  resonance  structures  of  the  carboxylate  anion  of  the  acid.  Fig.  5a  depicts  a  single  O 
Is  peak  with  a  binding  energy  of  531.15  ±  0.06  eV.  Upon  polymerization  to  form  the 
peptide,  again,  only  one  O  Is  peak  is  observed  (Fig.  5b),  but  the  binding  energy  is 
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Binding  Energy  /  eV 

Figure  4.  High  resolution  XPS  N  Is  spectra  of  (a)  glycine,  (b)  polyglycine  and  (c)  glucose  showing  nitrogen 
binding  energy  differences  between  zwitterion  and  amide  structures.  The  data  also  shows  the  absence  of 
nitrogen  in  the  carbohydrate  glucose.  Jagged  curves  are  the  raw  data  and  the  smooth  curves  are  the  best  fit 
approximations. 


increased  to  531.78  ±  0.06  eV.  The  carbohydrate  O  Is  spectrum  shows  a  broad  peak  that 
has  been  deconvoluted  into  two  structures:  the  primary  one  at  532.88  ±  0.04  eV  and  a 
smaller  one  with  approximately  one-fifth  the  area  at  533.62  ±  0.07  eV.  These  two  peaks 
are  characteristic  of  oxygen  atoms  of  the  alcohol  and  pyranose  ring  environments, 
respectively. 

SUMMARY 


XPS  is  a  surface  sensitive  technique  capable  of  distinguishing  between  core  protein 
and  carbohydrate  coatings  on  glycoproteins.  Deglycosylation  of  periodate-oxidized  porcine 
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Figure  5.  High  resolution  XPS  O  Is  spectra  of  (a)  glycine,  (b)  polyglycine  and  (c)  glucose  showing  oxygen 
binding  energy  differences  between  carboxylate,  amide  and  carbohydrate  (alcohol).  Jagged  curves  are  the  raw 
data  and  the  smooth  curves  are  the  best  fit  approximations. 


and  bovine  mucin  were  not  complete.  There  is  a  difference  in  the  composition  between 
coatings  of  normal  and  breast  cancer-associated  oligosaccharide  side  chains.  Further  exami¬ 
nation  of  XPS  in  terms  of  its  sensitivities  for  sample  amount  and  limits  of  detection  is  an 
exciting  area  of  protein  structure  analysis. 
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MEASUREMENT  OF  ASP/ASN  DAMAGE  IN 
AGING  PROTEINS,  CHEMICAL 
INTERCONVERSION  OF  ASPARTYL 
ISOMERS,  '*0  TAGGING  OF 
ENZYMATICALLY  REPAIRED  ASPARTYL 
SITES,  AND  ENZYME  AUTOMETHYLATION 
AT  SITES  OF  ASP/ASN  DAMAGE 


Jonathan  A.  Lindquist  and  Philip  N.  McFadden 

Department  of  Biochemistry  and  Biophysics 
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Corvallis,  Oregon  97331 


INTRODUCTION 

Asp/Asn  damage  in  aging  proteins,  resulting  from  the  propensity  of  L- Asn  and  L-Asp 
residues  to  spontaneously  convert  to  a  mixture  of  a-epimerized  and  p-isomerized  aspartyl 
products  via  succinimide  intermediates  (Figure  1),  is  a  practical  problem  from  the  standpoint 
of  researchers  seeking  to  isolate  and  study  pure  proteins.  In  particular,  the  advent  of  protein 
overexpression  systems  and  the  convenience  of  working  with  large  quantities  of  protein  has 
made  it  increasingly  common  for  investigators  with  no  prior  intention  of  studying  sponta¬ 
neous  protein  damage  to  find  that  a  protein  of  interest  has  undergone  a  transformation  that 
is  ultimately  found  to  be  due  to  Asp/Asn  damage.  The  first  indication  of  this  problem  is 
generally  the  detection  of  isoforms  of  a  polypeptide  with  altered  chromatographic  or 
electrophoretic  properties,  often  as  a  function  of  a  heat-step  involved  in  the  purification  of 
the  protein.  Other  times  the  formation  of  these  spontaneously  formed  isoforms  has  been 
traced  to  a  prolonged  fermentor  run  or  a  lengthy  storage  period  during  the  production  of  the 
protein.  Though  the  isoforms  of  the  protein  may  make  up  only  a  few  percent  of  the  total 
material,  their  presence  is  troubling  since  the  purity  of  the  protein  is  compromised. 

Successful  efforts  have  been  made  in  the  chemical  and  physical  characterization  of 
such  spontaneously  altered  proteins,  with  recent  examples  pertaining  to  characterization  of 
Asp/Asn  damage  in  overexpressed  forms  of  hirudin,  deoxyribonuclease  I,  calbindin,  inter¬ 
leukin- Ip,  epidermal  growth  factor,  human  growth  hormone,  anti-pl85”^^^,  tissue  plasmi¬ 
nogen  activator,  phosphocarrier  protein,  CD4,  somatotropin,  and  interleukin- la  (Bischoff, 
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etal,  1993,Cacia,  etaL,  1993,Chazin,  etal.,  1989,Daumy,  etal.,  1991,  George-Nascimento, 
et  al.,  1990,  Johnson,  et  al.,  1989,  Kwong  and  Harris,  1994,  Paranandi,  et  al,  1994,  Sharma, 
et  al,  1993,Teshima,  et  al.,  1991,  Violand,  etal,  1990,  Wingfield,  etal.,  1987).  Such  efforts 
have  made  use  of  a  varied  combination  of  methods  to  pinpoint  sites  of  Asp/Asn  damage  in 
the  respective  proteins,  the  most  general  of  which  have  included  a)  peptide  mapping  and 
detection  of  chromatographically  altered  protein  fragments  obtained  through  enzymatic 
and/or  chemical  cleavage  reactions,  b)  determination  of  asparagine  deamidation  by  mass 
spectral  analysis,  c)  the  failure  of  the  Edman  cleavage  reaction  at  p-isomerized  aspartyl 
residues,  and  d)  diagnostic  enzymatic  methylation  of  peptide  fragments  by  protein  (D-aspar- 
tyl/L-isoaspartyl)  carboxyl  methyltransferase  (PCM).  The  widespread  occurrence  of  protein 
Asp/Asn  damage  in  many  sequence  contexts  and  in  many  different  classes  of  proteins 
indicate  that  this  type  of  protein  damage  will  continue  to  plague  researchers  and  companies 
who  are  interested  in  producing  large  amounts  of  pure  protein. 


Measurement  of  Asp/Asn  Damage  in  Aging  Proteins 
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Figure  2.  Enzymatic  methylation  of  damaged  protein.  Protein  (D-aspartyl/L-isoaspartyl)  carboxyl  methyl- 
transferase  (PCM)  incorporates  the  methyl  group  from  S-adenosylmethionine  into  ester  linkage  with  the 
P-carboxyl  of  D-aspartyl  residues  and  the  a-carboxyl  of  L-isoaspartyl  residues.  The  same  enzyme  active  site 
is  capable  of  methylating  both  forms  of  Asp/Asn  damage,  possibly  because,  as  shown  in  the  box,  free  rotation 
about  the  N-C  bond  can  yield  similar  configurations  of  D-Asp  and  L-Isoasp  in  which  the  esterified  carboxyl 
group  is  in  approximately  the  same  position  in  space  relative  to  the  a-carbon  and  the  rest  of  the  protein 
backbone. 


Asp/Asn  damage  of  aging  proteins  is  also  a  physiological  problem  for  organisms 
dependent  on  the  integrity  of  their  proteins.  Many  tissue  and  cell  proteins  have  been  found 
to  contain  such  forms  of  damage,  including,  for  example,  a  large  proportion  of  the  p-amyloid 
protein  associated  with  Alzheimer’s  dementia  (Roher,  et  al.,  1 993).  Perhaps  an  even  stronger 
indication  of  the  physiological  relevance  of  Asp/Asn  damage  is  the  presence  in  most  cells 
of  the  above-mentioned  enzyme,  PCM,  whose  function  is  the  methylation  and  processing  of 
D-aspartyl  and  L-isoaspartyl  residues  that  form  as  the  result  of  intracellular  Asp/Asn  damage 
(Figure  2).  The  methylation  of  these  sites  and  their  rapid  demethylation  by  a  spontaneous 
mechanism  at  physiological  pH  has  been  shown  in  model  studies  to  convert  these  abnormal 
aspartyl  isomers  to  normal  L-aspartyl  residues  (Figure  3).  Hence,  the  function  of  this  enzyme 
evidently  relates  to  the  repair  of  sites  of  Asp/Asn  damage,  which  could  either  restore  a 
protein’s  function  or  could  allow  complete  proteolytic  degradation  of  a  protein  that  might 
otherwise  resist  degradation  because  of  the  presence  of  abnormal  amino  acid  isomers. 

This  paper  covers  four  aspects  related  to  Asp/Asn  damage  and  the  D-Asp/L-Isoasp 
enzymatic  methylation  pathway.  First,  some  suggestions  are  made  for  improvements  in  the 
use  of  chemical  reduction  assays  for  the  products  of  Asp/Asn  damage  (Carter  and  McFadden, 
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Figure  3.  Methylation-dependent  protein  repair.  Enzymatically  formed  methyl  esters  of  D-aspartyl  and  L- 
isoaspartyl  residues  are  subject  to  rapid  nonenzymatic  displacement  by  nucleophilic  attack  by  the  amide 
nitrogen  of  the  adjacent  amino  acid  residue  toward  the  C-terminus.  This  results  in  formation  of  succinimides, 
which  are  in  turn  subject  to  spontaneous  epimerization  and  hydrolysis  to  yield  four  products  (L-Asp,  D-Asp, 
L-Isoasp  and  D-Isoasp).  Two  of  these  (D-Asp,  L-Isoasp)  can  be  methylated  again  by  PCM,  affording  another 
opportunity  for  demethylation,  succinimide  formation,  and  succinimide  hydrolysis.  Eventually,  the  products 
of  the  process  are  highly  enriched  in  L-Asp  and  D-Isoasp.  L-Asp,  being  a  normal  a-amino  acid,  may  restore 
function  to  the  damaged  protein,  or  may  at  least  permit  the  aging  protein  to  be  fully  destroyed  by  proteolytic 
mechanisms.  While  D-Isoasp  is  not  a  normal  a-amino  acid,  it  is  possible  that  D-Isoasp  may  be  a  functional 
equivalent  for  L-Asp  since  L-Asp  and  D-Isoasp  might  adopt  similar  configurations  as  might  D-Asp  and 
L-Isoasp  as  shown  in  Figure  2. 


1994a,  Carter  and  McFadden,  1994b).  Second,  a  technique  is  described  that  can  chemically 
interconvert  normal  aspartyl  residues  and  isoaspartyl  residues  for  purposes  of  synthesizing 
model  damaged  peptides,  and  conversely,  for  enabling  the  Edman  sequencing  through  an 
isoaspartyl  linkage.  Third,  recent  isotope-labeling  evidence  in  support  of  methylation-de¬ 
pendent  repair  of  Asp/Asn  damage  is  reviewed.  Fourth,  automethylation  involving  an 
Asp/Asn-damaged  subpopulation  of  PCM  (termed  the  aPCM  fraction)  is  described 
(Lindquist  and  McFadden,  1994a),  and  a  model  is  presented  for  the  unusual  kinetics  of  PCM 
automethylation. 


MEASUREMENT  OF  ISOASP ARTIC  ACID  AND  PROTEIN 
SUCCINIMIDES  BY  CHEMICAL  REDUCTION 

The  hydrolysis  of  a  protein  to  free  amino  acids  results  in  the  loss  of  any  information 
as  to  how  an  aspartyl  residue  was  linked  within  the  protein,  and  so  it  has  not  been  possible 
to  measure  succinimidyl  and  isoaspartyl  residues  in  conjunction  with  conventional  amino 
acid  analysis.  Recently  we  investigated  whether  two  related  approaches  of  chemical  reduc¬ 
tion  might  show  promise  in  converting  succinimides  and  isoaspartic  acid  to  derivatives  that 
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are  stable  to  protein  hydrolysis.  The  first  method  involves  reductive  ring-opening  of  protein 
succinimides  by  sodium  borohydride  to  yield  homoserine  and  isohomoserine  upon  protein 
hydrolysis  (Figure  4).  This  technique  was  validated  for  model  compounds  containing  known 
quantities  of  succinimides  (Carter,  et  al.,  1994b).  The  second  method  uses  borane  (BH3) 
reduction  to  convert  the  free  a-carboxyl  group  of  isoaspartyl  residues  to  an  alcohol,  resulting 
in  isohomoserine  that  can  be  detected  by  protein  hydrolysis  and  amino  acid  analysis  (Figure 
5).  This  method  has  also  been  validated  with  model  polypeptides  (Carter,  et  al.,  1994a). 

An  improvement  in  borane  reduction  of  polypeptides  can  be  made  by  using  as  a 
reducing  agent  the  commercially  available  borane  dimethyl  sulfide  complex  in  place  of 
borane  tetrahydrofum  complex  that  was  the  reagent  used  previously  (Carter,  et  al.,  1994a) 
This  change  is  to  be  recommended  partly  because  borane  dimethylsulfide  is  a  more  stable 
and  less  hazardous  reagent  than  borane  tetrahydrofuran,  and  because  borane  dimethylsulfide 
excelled  in  a  direct  comparison  of  the  effectiveness  of  the  two  reagents  in  the  reduction  of 
N-carbobenzoxy-L-aspartic  acid-p-benzyl  ester  to  N-carbobenzoxy-L-isohomoserine-p- 
benzyl  ester. 

A  general  difficulty  in  applying  reduction  methods  to  the  assay  of  Asp/Asn  damage 
in  large  proteins  is  in  detection  of  the  substoichiometric  content  of  isohomoserine  that 
typically  is  expected  to  result  from  borohydride  or  borane  reduction.  For  example,  less  than 
one  percent  of  the  total  aspartic  acid  and  asparagine  present  may  be  present  as  succinimide 
and/or  isoaspartic  acid,  and  so  only  a  small  amount  of  isohomoserine  could  possibly  be 
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Figure  5.  Detection  of  isoaspartic  acid  as  the  corre¬ 
sponding  alcohol.  The  reduction  of  protein  carboxyl 
groups  by  borane  treatment  (Atassi  and  Rosenthal, 
1969,  Rosenthal  and  Atassi,  1967)  as  applied  to  pro¬ 
tein  isoaspartyl  groups  results  in  formation  of  isoho¬ 
moserine. 


expected.  A  useful  adjunct  in  such  cases,  then,  is  to  degrade  most  of  the  a-amino  acids  in  an 
amino  acid  hydrolyzate  by  snake  venom  L-amino  acid  oxidase,  leaving  the  p-amino  acid 
isohomoserine  as  a  stronger  signal  above  the  background.  However,  a  further  difficulty  in 
the  analysis  of  small  amounts  of  isohomoserine  is  the  low  ninhydrin  color  constant  exhibited 
by  this  amino  acid.  The  reddish  color  intensity  following  ninhydrin  spraying  of  thin-layer 
separated  isohomoserine  is  less  than  1/1 0th  that  of  equivalent  amounts  of  aspartic  acid,  and 
ion-exchange  separation  of  isohomoserine  with  post-column  ninhydrin  detection  indicates 
a  ninhydrin  color  yield  (570  nm+460nm)  that  is  as  little  as  1/5  0th  that  of  equivalent  amounts 
of  aspartic  acid.  Depending  on  the  amount  of  starting  material,  then,  a  means  other  than 
ninhydrin  may  be  necessary  for  detection  of  isohomoserine.  Derivatization  of  isohomoserine 
with  either  phenylisothiocynate  or  dabsyl  chloride,  and  chromatographic  separation  of  the 
conjugates  by  reversed  phase  HPLC  are  promising  routes  to  isohomoserine  analysis  since 
the  isohomoserine  derivatives  absorb  in  the  ultraviolet  (PTC-isohomoserine)  and  at  460  nm 
(dabsyl-Ihser)  approximately  as  well  on  a  molar  basis  as  derivatives  of  a-amino  acids. 


INTERCONVERSION  OF  ASP  AND  ISOASP  RESIDUES  BY 
CHEMICAL  ESTERIFICATION  AND  DE-ESTERIFICATION 

There  is  a  convenient  procedure  for  converting  normal  L-Asp  peptides  to  peptide 
mixtures  containing  the  several  aspartyl  isomers  (McFadden  and  Clarke,  1986).  Here  the 
methyl  ester  of  the  L-aspartyl  side  chain  is  first  formed  by  acidic  meihanol  tretament.  The 
ester  is  then  displaced  via  a  succinimide  intermediate  upon  mild  alkaline  treatment.  Hydroly¬ 
sis  of  the  succinimide  finally  gives  rise  to  the  mixture.  The  main  product  in  the  mixture  is 
the  L-isoaspartyl  derivative,  which  can  be  purified  to  serve  as  a  stoichiometric  substrate  in 
various  studies  of  protein  (D-aspartyl/ L-isoaspartyl)  carboxyl  methyltransferase.  For  exam- 
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pie,  L-Trp-L--Met-L-Isoasp-L-Phe-NH2  was  prepared  in  this  manner  for  studies  described 
below,  and  recently  an  esterified/de-esterified  preparation  of  bovine  serum  albumin  has  been 
prepared  and  found  in  enzymatic  assays  to  be  extensively  methylated  by  PCM.  Similar 
chemistry  could  be  used  to  convert  isoaspartyl  linkages  to  normal  aspartyl  linkages,  which 
could  permit  Edman  sequencing  beyond  an  otherwise  sequence-terminating  isoaspartyl  site. 

The  major  limitation  of  this  procedure  is  that  glutamic  acid  and  C-terminal  carboxyls 
are  also  methyl  esterified  and  are  not  extensively  de-esterified  under  mild  alkaline  condi¬ 
tions.  This  results  in  additional  complexity  of  the  mixture  with  the  varied  presence  of  these 
other  methylated  carboxyl  groups.  We  have  recently  explored  a  general  solution  to  this 
problem  by  chemically  forming  the  benzyl  esters  of  peptide  carboxyl  groups.  Here,  the  mild 
alkaline  treatment  of  the  peptide  ester  is  again  expected  to  facilitate  the  formation  of 
succinimide  rings  that  are  then  hydrolyzable  to  mixtures  of  aspartyl  isomers.  The  remaining 
benzyl  esters  of  glutamic  acid  residues  and  of  the  C-termini  can  then  be  removed  by  catalytic 
reduction,  most  conveniently  by  palladium  catalyst  with  a  hydrogen  donor  such  as  formate. 
This  approach  has  met  limited  success  thus  far  in  our  hands  because  the  initial  formation  of 
the  benzyl  ester  is  not  as  facile  as  the  formation  of  the  methyl  ester.  Even  so,  it  will  be 
worthwhile  to  develop  improved  esterification  conditions  since  this  approach  can  enable  the 
interconversion  of  aspartyl  isomers  without  modifying  the  other  carboxyls. 


METHYLATION-DEPENDENT  PROTEIN  REPAIR  VIA  REPEATED 
PASSAGE  THROUGH  A  SUCCINIMIDE  INTERMEDIATE 

A  rather  inefficient  aspect  of  the  pathway  for  methylation-dependent  repair  of 
L-isoaspartyl  sites  in  peptides  is  the  requirement  for  repeated  formation  and  hydrolysis  of 
the  succinimide  intermediate.  While  kinetic  evidence  suggested  that  indeed  on  the  order  of 
5  cycles  of  methylation,  demethylation,  and  succinimide  hydrolysis  are  necessary  to  effect 
the  complete  repair  of  a  peptide,  this  assumption  had  not  been  directly  tested.  Recently,  the 
expectation  that  from  [*^0] water  would  be  incorporated  into  the  peptide  upon  succin¬ 
imide  hydrolysis  was  used  to  test  for  multiple  passages  through  the  succinimide  intermediate 
during  peptide  repair  (Figure  6).  Here,  the  model  peptide  used  was  L-Trp-L-Met-L-Isoasp- 
L-Phe-NH2,  which  itself  had  been  prepared  by  chemical  esterification/de-esterification  as 
described  above.  [*^0]  Water  (43%)  was  included  in  the  reaction  medium,  which,  in  addition 
to  the  isopeptide,  consisted  of  bovine  erythrocyte  protein  carboxyl  methyltransferase, 
S-adenosylmethionine  (AdoMet),  and  pH  7.8  phosphate  buffer.  Following  a  48  hour  reac¬ 
tion,  the  peptide  products  were  purified  by  reversed  phase  HPLC,  and  peptide  masses  were 
measured  by  fast- atom  bombardment  mass  spectrometry.  The  identification  and  quantifica¬ 
tion  of  doubly- labeled  normal  aspartyl  peptide  as  a  repaired  product  fit  closely  with 
quantitative  predictions  of  the  extent  of  methylation/  demethylation/  succinimide  hydrolysis 
that  would  occur  in  a  period  of  about  27  hours,  rather  than  the  actual  48  hours  of  the  reaction 
(Lindquist  and  McFadden,  1994b) .  Thus,  the  theoretical  kinetics  of  repair  closely  matched 
the  experimental  work,  verifying  that  multiple  succinimide  hydrolyses  take  place,  but  with 
the  caveat  that  during  the  final  ~21  hours  of  the  reaction  little  peptide  repair  took  place.  We 
originally  ascribed  the  failure  of  the  repair  reaction  to  go  to  completion  to  either  enzyme 
denaturation  during  the  lengthy  incubation,  or  to  the  buildup  of  S-adenosylhomocysteine 
(AdoHcy),  the  end-product  inhibitor  of  methyltransferase s.  Recently,  however,  we  have 
found  an  additional  factor  that  may  block  the  completion  of  the  repair  reaction  in  that 
S-methylthioadenosine  (MTA)  has  been  detected  in  repair  reaction  mixtures  as  a  substantial 
spontaneous  breakdown  product  of  AdoMet.  Since  MTA  may  be  equal  to  or  more  potent 
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Figure  6.  L-Succinimide  (L-imide)  postulated  as  the  central  intermediate  during  peptide  repair,  showing  the 
sites  (asterisks)  of  potential  incorporation  of during  hydrolytic  ring  opening  in  the  presence  of  [’^O]  water. 


than  AdoHcy  as  a  methyltransferase  inhibitor,  this  spontaneous  side-reaction  may  be  a  major 
cause  of  the  slowing  of  the  rate  of  peptide  repair. 


A  DAMAGED  SUBPOPULATION  OF  PROTEIN  (D-ASPARTYL/ 
L-ISOASPARTYL)  CARBOXYL  METHYLTRANSFERASE  IS 
METHYLATED  BY  A  HIGH  AFFINITY,  LOW-TURNOVER 
REACTION 

Interesting  possibilities  for  feedback  can  be  predicted  to  occur  if  enzymes  that  detect 
damage  in  other  aging  proteins  are  themselves  damaged  with  age.  As  an  example,  bovine 
erythrocyte  PCM  was  recently  found  to  methylate  itself  on  a  subpopulation  of  enzyme 
molecules  (Lindquist,  et  al.,  1994a).  The  subpopulation  of  presumably  damaged  L-Isoasp- 
and  D- Asp-containing  enzyme  molecules  has  been  termed  the  aPCM  fraction.  From  the 
known  specific  activity  of  [^H-mer/iy/]S-adenosylmethionine  used  in  the  radioactive 
automethylation  assay  it  is  calculated  that  aPCM  molecules  make  up  approximately  1  %  of 
the  total  PCM  population  in  the  cell.  Such  a  low  stoichiometry  of  methylation  is  expected, 
given  that  over  the  lifetime  of  the  cell  there  is  only  a  partial  spontaneous  conversion  of  amino 
acid  sites  to  D-aspartyl  and  L-isoaspartyl  residues,  and  given  the  hypothesis  that  part  of  this 
spontaneous  damage  is  repaired  by  the  enzymatic  methylation  pathway.  aPCM  can  be  partly 
enriched  by  anion-exchange  chromatography  and  then  quantified  by  HPLC  (Figure  7), 
probably  on  the  basis  that  deamidation  of  an  Asn  yields  the  slightly  more  negatively  charged 
aPCM.  Preparations  with  up  to  about  10%  aPCM  have  been  obtained  in  this  manner. 

To  investigate  the  mechanism  of  PCM  automethylation,  assays  were  performed  at 
several  different  enzyme  dilutions.  It  was  found  that  the  specific  rate  of  aPCM  methylation 
increases  with  PCM  concentration.  This  shows  that  the  automethylation  reaction  involves 
more  than  a  single  PCM  molecule  since  an  intrapeptide  methylation  reaction  would  not  have 
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its  rate  affected  by  enzyme  concentration.  Most  likely,  enzyme  automethylation  involves  the 
incorporation  of  a  methyl  group  into  an  aPCM  molecule  by  the  activity  of  a  second  PCM 
molecule. 

The  specific  rate  of  aPCM  plateaus  at  high  concentrations  of  total  PCM  (Figure  8), 
indicative  of  a  saturating  reaction.  The  assumption  of  a  rapid  equilibrium  in  the  interaction 
between  aPCM  and  active  PCM  has  allowed  the  derivation  of  a  rate  equation  that  lends  a 
good  theoretical  fit  to  our  dilution  experiments  (McFadden  and  Lindquist,  1994).  In  this 
equation, 


V'  =  v/[PCM],,,  =  akp  [PCM],,,  /  (K3  +  [PCM],„), 

the  specific  rate,  v',  of  aPCM  methylation  is  given  as  a  function  of  the  total  PCM 
concentration,  [PCM],,,;  a  is  the  fractional  population  of  aPCM  (e.g.2%);  kp  is  the  turnover 
number  for  the  aPCM  methylation  reaction;  and  Kg  is  the  dissociation  constant  between 
PCM  and  aPCM.  By  applying  the  above  equation  to  the  experimental  dilution  studies,  values 
for  the  kinetic  constants  were  calculated  to  be  kp,  0.0095min'^  and  Kg,  0.5  pM.  These  values 
were  constant  in  experiments  with  different  PCM  preparations,  including  those  containing 
different  percentages  of  aPCM  (Figure  8).  These  values  for  Kg  and  kp  are  interesting  and 
somewhat  surprising.  The  turnover  number  for  aPCM  methylation.  Kg,  is  lower  than  for  the 
methylation  of  most  other  polypeptides  by  PCM  (Lowenson  and  Clarke,  1991),  which  could 
indicate  that  PCM  has  a  high  affinity  for  its  damaged  “brethren”.  While  this  could  be 
considered  a  logical  adaptation  to  preserve  the  integrity  of  the  repair  system,  the  turnover 
number  for  the  automethylation  reaction,  kp,  is  so  low  as  to  suggest  that  the  enzyme-substrate 
complex,  PCM* aPCM,  is  nearly  a  dead-end  since  the  complex  decays  to  methylated  product 
at  a  rate  of  less  than  once  every  hundred  minutes.  This  combination  of  high  affinity  and  low 
turnover  suggests  that  as  more  aPCM  is  formed  by  spontaneous  aging,  the  enzyme  could 
conceivably  become  self-occupied  by  its  slow  self-methyl ation  reaction,  interfering  with  the 
methylation  and  further  metabolic  processing  of  other  age-damaged  proteins. 


CONCLUSIONS 

Asp/Asn  damage*  affects  numerous  if  not  all  proteins  both  in  vitro  and  in  vivo. 
Combined  approaches,  including  the  chemical  reduction  methods  described  here,  can  be 
used  to  measure  Asp/Asn  damage  in  a  given  protein.  While  protein  engineering  can  enable 
the  elimination  of  particularly  troublesome  sites  that  are  prone  to  Asp/Asn  damage,  this  may 
not  be  a  complete  answer  to  the  practical  problem  of  Asp/Asn  damage  since  it  is  not 
uncommon  for  proteins  to  develop  multiple  sites  of  Asp/Asn  damage,  and  in  the  long  run, 
essentially  every  Asp  or  Asn  residue  could  develop  some  degree  of  damage.  Nature  has 
evidently  taken  an  active  approach  to  solving  the  problem  of  Asp/Asn  damage  by  selecting 
for  the  presence  in  most  living  cells  of  a  PCM  activity  that  specifically  methylates  and 
metabolizes  Asp/Asn  protein  damage.  PCM  is  being  exploited  increasingly  as  an  in  vitro 
tool  to  diagnose  sites  of  Asp/Asn  damage,  and  given  the  present  good  understanding  of 
methylation-dependent  repair  of  Asp/Asn  damage  it  is  now  conceivable  that  a  similarly 
active  approach  can  be  used  to  repair  damaged  proteins  of  pharmaceutical  or  industrial 
importance  and  to  help  ensure  the  preservation  of  activity  of  proteins  during  in  vitro  reactions 
of  various  kinds.  Untangling  the  full  complexity  of  the  metabolism  of  intracellular  Asp/Asn 


Asp/Asn  damage  is  the  subject  of  a  recent  excellent  book,  Deamidation  and  Isoaspartate  Formation  in 
Peptides  and  Proteins,  Aswad,  D.W.,  Ed.,  CRC  Press,  Boca  Raton,  FL  1995. 
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PCM  (^iM) 

Figure  8.  The  effect  of  dilution  on  the  rate  of  PCM  automethylation.  The  initial  rate  of  PCM  automethylation 
(percent  of  the  total  PCM  methylated  per  minute)  was  measured  as  a  function  of  enzyme  dilution.  Since  the 
reactions  contained  differing  volumes  of  purified  PCM,  the  additional  volumes  were  replaced  with  buffer 
containing  identical  salts  as  in  the  purified  PCM  solution.  The  experiment  was  repeated  using  different  samples 
of  anion-exchange  fractionated  PCM,  containing  estimated  percentages  of  aPCM  as  follows:  O  ,  8-10% 
content  of  aPCM;  • ,  2%  aPCM;  ■ ,  1 .5%  aPCM.  These  estimates  of  aPCM  content  were  made  by  measuring 
the  final  level  of  automethylation  in  extended  time  courses,  and  taking  into  account  a  small  competing  rate  of 
ester  hydrolysis  that  occurs  in  such  time  courses.  Methyl  ester  formation  in  the  PCM  polypeptide  chain  was 
quantified  following  either  HPLC  separation  (•)  or  following  acidic  gel  electrophoresis  (O,  ■).  The  theoretical 
saturation  curves  were  generated  by  the  equation  in  the  text,  using  kp  =  0.0095  min  ',  Kg  =  0.50  pM  PCMtot, 
and  the  values  for  a  shown  in  this  figure. 


damage  is  still  in  the  future,  though,  and  factors  such  as  Asp/Asn  damage  in  PCM  itself  and 
the  resulting  automethylation  of  PCM  may  have  considerable  importance  in  determining  the 
effectiveness  of  the  metabolic  systems  that  process  damaged  protein. 
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INTRODUCTION 

Ribosomes  are  complex  organelles  which  contain  50  to  80  different  ribosomal 
proteins  (r-proteins)  and  several  RNA  molecules  (rRNAs).  Bacteria  sediment  at  VOS  with 
308  and  50S  subunits  whereas  eukaryotes  have  a  sedimentation  coefficient  of  SOS  with  40S 
and  60S  subunits.  Accordingly,  the  ribosomes  from  different  organisms  vary  considerably 
although  their  main  function,  namely  to  perform  the  translation  of  the  genetic  message  into 
proteins  is  generally  maintained.  Without  detailed  knowledge  of  the  molecular  structure  of 
the  components  and  their  topography  within  the  organelle  the  translational  processes  in 
which  more  than  200  molecules  are  involved  cannot  be  understood.  Therefore,  information 
on  the  nearest  neighborhoods  of  the  RNAs  and  the  r-proteins  and  their  individual  domains 
is  the  basis  for  understanding  the  translational  machinery  on  a  molecular  level. 

Several  approaches  have  led  to  general  models  on  the  topography  in^".  coli  ribosomes 
(Brimacombe  et  al,  1 990;  Walleczek  et  al.,  1988).  These  were  assembled  by;  i,  immuno-elec- 
tron  microscopy  data  obtained  by  binding  studies  of  antibodies  directed  against  individual 
purified  ribosomal  proteins  to  the  bacterial  ribosome  and  its  subunits  (Stdffler-Meilicke  and 
Stoffler,  1990);  ii,  reconstitution  assays  employing  individual  components  whereby  the 
so-called  assembly  maps  of  the  ribosomal  constituents  were  established  for  the  30s  and  50S 
subunit,  respectively  (Nomura  and  Held,  1976;  Nierhaus  and  Dohme,  1974);  iii,  neutron 
scattering  resulting  in  distances  between  various  r-proteins  within  the  subunit  (Moore  et  al., 
1986;  Nowotny  et  al.,  1986);  iv,  X-ray  analysis  on  crystals  of  intact  bacterial  ribosomal 
particles  yielding  informations  about  the  shape  of  the  ribosome  and  the  arrangement  of 
structured  elements  within  the  subunits  (Yonath  et  al,  1990;  Wittmann  et  al.,  1982);  v, 
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cross-linking  experiments  resulting  in  informations  about  protein-protein-,  RNA-RNA-,  and 
protein-RNA  interactions. 

Cross-linking  of  ribosomal  RNA  to  ribosomal  proteins  is  a  straightforward  approach 
to  gain  insight  into  the  structural  arrangement  of  these  components  within  the  complex.  Many 
cross-link  sites  between  ribosomal  proteins  and  rRNA  were  analyzed  at  the  rRNA  level  of 
E.  coli  (for  review  see  Brimacombe,  1991)  whereas  only  for  ribosomal  proteins  S7  and  L4 
of  E.  coli  the  exact  cross-link  position  to  the  rRNA  has  also  been  identified  at  the  amino  acid 
level  (Ehresmann  et  al.,  1976,  Moller  et  al.,  1978;  Maly  et  al.,  1980).  On  the  other  hand, 
more  information  about  protein  neighborhoods  was  gained  by  cross-linking  proteins  within 
the  ribosome  (Traut  et  al.,  1986)  and  several  cross-linked  protein-protein  positions  were 
identified  at  the  amino  acid  level  (Allen  et  al.,  1979;  Pohl  and  Wittmann-Liebold,  1988; 
BrockmollerandKamp,  1988;  Herwigetal.,  1993;  Bergmann  and  Wittmann-Liebold,  1993). 

Only  by  precisely  determining  the  direct  contact  points  between  the  constituents  can 
the  three  dimensional  structure  of  the  ribosome  be  solved  at  the  molecular  basis.  Without 
this  knowledge  the  mechanism  of  the  tranlational  machinery  cannot  be  understood. 

Here  we  present  a  method  for  the  analysis  of  rRNA-protein  cross-links  induced  in 
ribosomal  subunits  from  E.  coli  and  Bacillus  stearothermophilus  at  the  amino  acid  and 
nucleotide  level  and  we  discuss  the  results  obtained  with  various  peptides  cross-linked  to 
the  16S  or  23S  RNAs  in  E,  coli  or  B.  stearothermophilus  ribosomes. 


EXPERIMENTS 

Cross-Linking 

Cross-linking  of  the  ribosomal  subunits  was  done  with  the  heterobifunctional  reagent 
2-iminothiolane  (Traut  et  al.,  1973;  Wower  et  al.,  1981)  or  by  mild  UV  irridiation  for  10-15 
min  at  a  concentration  of  5  A260  units/ml  in  a  buffer  consisting  of  5mM  mangnesium  acetate, 
50  mM  KCL,  6mM  p-mercaptoethanol,  10  mM  Tris-HCl  (pH  7.8)  as  described  by  Moller 
et  al.,  1978.  After  cross-linking  ribosomal  subunits  were  redissolved  in  25mM  Tris-buffer 
pH  7,8  containing  0.1  %  SDS,  2  mM  EDTAand  6  mM  p-mercaptoethanol. 

Isolation  of  Cross-Linked  RNA-Proteins 

The  general  strategy  for  the  isolation  of  the  proteins  cross-linked  to  rRNA  is  given 
in  Fig.  1. 

Cross-linked  proteins  were  separated  from  the  non  cross-linked  moiety  either  by 
sucrose  gradient  centrifugation  or  by  size  exclusion  chromatography  on  a  S300  column 
(Pharmacia  LKB  Biotechnologie,  Uppsala,  Sweden)  in  the  same  buffer,  see  Figure  2.  The 
proteins  cross-linked  to  the  rRNA  eluted  together  with  the  non  cross-linked  rRNA  moiety  in 
the  chromatogram. 

Isolation  of  Cross-Linked  Oligonucleotide-Peptide  Heteromers 

For  digestion  of  the  cross-linked  ribosomal  proteins  the  rRNA  containing  fractions 
were  precipitated  and  redissolved  in  an  appropriate  volume  of  buffer.  Proteins  cross-linked 
to  the  rRNA  were  digested  with  endoproteases  Lys-C,  Glu-C  or  chymotrypsin.  The  remaining 
cross-linked  peptides  were  separated  from  released  peptides  by  size  exclusion  chromatog¬ 
raphy  as  described  above.  For  identification  of  the  cross-linked  peptides  in  the  fractions  the 
rRNA  was  fully  digested  with  ribonucleases  A  and  T1  or  partially  treated  with  ribonuclease 
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T1  and  injected  directly  onto  a  HPLC  100  RP-18  LiChrospher®  endcapped  column  (250  x 
4  mm,  5  pm,  E.  Merck,  Darmstadt,  Germany),  see  Figure  3. 

Solvent  A  was  water  with  0.1%  TFA,  solvent  B  acetonitrile  with  0.1%  TFA.  Fractions 
which  showed  absorption  at  220  and  260  nm  (corresponding  to  the  cross-linked  peptide-nu¬ 
cleotide  portion)  were  sequenced  directly  in  a  Model  477Apulsed-liquid-gas-phase  sequencer 
equipped  with  a  model  120A  amino  acid  analyzer  (Applied  Biosystems,  Inc.,  Foster  City,  U.  S. 
A.).  Sequences  were  identified  by  comparison  with  known  ribosomal  sequences  (Wittmann- 
Liebold  et  ah,  1990)  in  the  NBRE"  databank  (National  Biomedical  Research  Foundation, 
Washington  DC,  USA).  Atypical  Edman  degradation  run  is  shown  in  Figure  4  for  an  individual 
peptide  from  protein  BstL6  starting  at  position  149  with  PTH-alanine  cross-linked  to  an 
oligonucleotide  of  the  23S  RNA.  In  the  8th  degradation  step  the  expected  tyrosine-156  could 
not  be  detected  due  to  its  covalent  attachment  to  the  rRNA  and  is  designed  as  X  in  Figure  4. 

RESULTS 

Cross-Linking  Sites 

Using  the  strategy  above  we  were  able  to  localize  and  sequence  peptides  of  the 
ribosomal  proteins  S7  and  SI 7  from  the  small  ribosomal  subunit  and  of  L4,  L6  and  L 14  from 
the  large  ribosomal  subunit  cross-linked  to  the  rRNA  as  shown  in  Figure  5. 
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Figure  2.  Separation  of  cross-linked  ribosomal  proteins  of  the  large  subunit  from  B.  stearothermophilus  from 
the  non-cross-linked  moiety  by  size  exclusion.  Cross-linked  proteins  were  verified  by  SDS-PAGE.  XL: 
SDS-PAGE  of  cross-linked  ribosomal  subunits  after  size  exclusion  chromatography,  cross-linked  proteins  are 
indicated  by  arrows;  Control:  SDS-PAGE  of  non  cross-linked  ribosomal  subunits  after  size  exclusion 
chromatography.  For  SDS  gel  electrophoresis  all  aliquots  from  size  exclusion  chromatography  were  digested 
with  Ipg  RNase  A  (present  as  one  band  in  the  lower  part  of  the  gels). 


The  Binding  Domain  of  Protein  S7  and  S17  to  the  16S  RNA 

From  the  3 OS  ribosomal  subunit  we  isolated  peptides  of  protein  S7  (fragment 
positions  109-129  from£.  co// and  fragment  positions  1-21  and  1 14-135  from  5.  stearother¬ 
mophilus).  During  Edman  degradation  of  the  cross-linked  fragments  no  PTH-methionine  in 
position  114  of  £.  coli  protein  S7  and  in  position  115  of  5.  stearothermophilus  protein  S7 
was  detected,  whereas  the  following  amino  acid  residues  in  the  sequence  could  be  positively 
identified,  indicating  that  these  methionines  are  the  cross-link  sites  to  the  rRNA-oligonu- 
cleotide.  These  data  are  in  agreement  with  those  of  Moller  et  al.  (1978),  who  found  the 
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Figure  3.  RP-HPLC  chromatogram  of  cross-linked  ribosomal  peptides  of  the  large  ribosomal  subunit  from  B. 
stearothermophilus  after  digestion  with  RNase  Tl. 
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Figure  4.  Individual  Edman-degradation  steps  of  an  isolated  cross-linked  peptide  of  the  ribosomal  protein  L6 
from  B.  stearothermophilus.  PTH-amino  acids  are  given  in  the  one  letter  code,  X  symbolize  no  detectable 
PTH-amino  acids,  due  to  its  covalent  attachment  to  the  rRNA 
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ECO  L4  RDFNEALVHQVWAYAAGARQGTRAQKTRAE 
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Figure  5.  Identified  homologions  pep¬ 
tides  (bold)  with  to  the  rRNA  cross-linked 
amino  acids  (arrows)  from  ribosomal  sub¬ 
units  of  E.  coli  (Eco)  and  B.  stearothermo- 
philus  (Bst). 


cross-link  between  methionine- 1 14  and  uridine- 1240  in  the  16S  rRNA  of  E.  coli.  In  addition 
we  determined  the  lysine  in  position  8  of  B.  stearothermophilus  protein  S7  as  a  second 
cross-link  site.  We  also  identified  a  fragment  of  protein  SI  7  in  both  bacteria  (at  positions 
19-35  in  E.  coli  and  at  positions  21-37  in  B.  stearothermophilus)  cross-linked  at  lysine  29 
{E.  coli)  and  3 1  {B.  stearothermophilus)  to  the  16S  rRNA.  This  domain  of  SI  7  was  assumed 
to  be  in  close  contact  to  the  rRNA  by  ^H-NMR  and  *^N-NMR  structure  analysis  (Golden  et 
al.,  1993b).  Our  data  allow  for  the  first  time  to  resolve  the  S17  contact  site  to  the  16S  rRNA 
at  the  amino  acid  level. 

Peptide-rRNA  Cross-Links  within  the  Large  Ribosomal  Subunit 

Within  the  50S  ribosomal  subunit  of  both  bacteria  we  sequenced  peptides  of  protein 
L14  from  position  24  to  position  44  in  B.  stearothermophilus  and  to  position  39  in  E.  coli. 
Both  peptides  contain  a  tyrosine  in  position  32  but  only  in  B.  stearothermophilus  this  amino 
acid  could  be  identified  as  the  cross-link  site  to  the  23  S  rRNA.  In  the  same  way  we  detected 
tyrosine-35  in  protein  L4  cross-linked  to  the  23  S  rRNA  within  the  5 OS  ribosomal  subunit  of 
E.  coli,  which  confirms  the  data  of  Maly  et  al.  (1980)  for  the  cross-link  position  of  tyrosine-35 
to  uridine-615  in  the  23 S  rRNA.  As  mentioned  above  we  found  tyrosine- 156  in  protein  L6 


19  ^  35 

Eco  S17  KMEKSIWAIERFVKHPIYGKFIKRTTKLHV 
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from  B.  stearothermophilus  cross-linked  to  the  23 S  rRNA,  These  data  again  agree  well  with 
the  three  dimensional  structure  of  the  protein.  The  recently  published  three  dimensional 
structure  of  protein  L6  shows  tyrosine- 1 56  within  the  putative  RNA  binding  domain  (Golden 
et  al.,  1993a). 


CONCLUSIONS 

The  new  approach  directly  determines  amino  acid  residues  of  ribosomal  proteins 
involved  in  interaction  with  the  rRNA  and  thereby  generates  data  on  the  structural  organi¬ 
sation  of  the  ribosome  at  the  molecular  level.  By  screening  ribosomal  subunits  from  both 
bacteria  30  other  peptides  cross-linked  to  the  rRNA  could  be  established  on  the  molecular 
level  (H,  Urlaub,  and  B.  Wittmann-Liebold,  manuscript  in  preparation).  Many  of  the  peptides 
identified  to  be  cross-linked  to  the  rRNA  are  homologous  in  both  organisms  as  shown  for 
peptides  derived  from  ribosomal  proteins  S7,  SI 7  and  LI 4  revealing  a  nearly  identical 
topography  of  the  contact  sites  in  these  organisms.  Cross-linked  amino  acids  found  were 
basic  residues  particulary  lysines  after  chemical  reaction  with  2-iminothiolane,  and  tyrosines 
and  methionine  after  cross-linking  via  mild  UV  irridation. 

The  detected  peptide  sequences  determined  so  far  show  no  significant  sequence 
similarities  to  other  structural  sequence  elements  found  in  RNA-complexes  like  the  common 
RNP-motif  (for  review  see  Mattaj,  1989),  although  the  isolated  peptides  are  rich  in  basic 
residues  (lysine,  arginine),  aromatic  residues  (especially  tyrosines)  and  small  hydrophobic 
amino  acids  (glycine,  valine,  leucine,  and  isoleucines).  Furthermore,  the  precise  analysis  of 
the  cross-link  site  at  the  amino  acid  level  allows  to  substantiate  whether  one  or  more  domains 
of  a  ribosomal  protein  are  in  direct  contact  to  the  rRNA  and  which  secondary  structure 
elements  are  involved.  Experiments  to  analyze  the  oligonucleotide  part  of  the  various 
sequenced  cross-linked  peptides  are  in  progress  in  order  to  determine  the  corresponding 
nucleotide  sites  on  the  rRNAs.  . 
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INTRODUCTION 

Adrenodoxin  belongs  to  the  family  of  /2Fe-2S/  type  ferredoxins  being  widely 
distributed  in  bacteria,  plants  and  animals.  Although  adrenodoxin  is  a  small  (~14  kDa)  and 
soluble  protein,  its  three-dimensional  structure  has  not  been  elucidated  as  yet.  It  functions 
as  an  electron  carrier  from  the  FAD-containing  NADPH-dependent  ferredoxin  reductase  to 
the  cytochromes  P450scc  (CYPllAl),  which  catalyzes  the  side-chain  cleavage  of  choles¬ 
terol,  the  initial  step  in  adrenal  steroidogenesis,  and  P450iip  (CYPllBl),  being  involved  in 
the  formation  of  cortisol  and  aldosterone  (Fig.  1). 

Different  models  of  electron  transfer  via  adrenodoxin  have  been  proposed,  the 
“shuttle”  model  (Hanukoglu  and  Jefcoate,  1980),  a  ternary  complex  formation  of  adreno¬ 
doxin  reductase,  adrenodoxin,  and  the  cytochrome  P450  (Kido  and  Kimura,  1979),  and  a 
model  suggesting  the  occurrence  of  two  adrenodoxin  molecules  in  the  electron  transport 
chain  (Hara  et  al.,  1994). 

Recognition  and  interaction  of  adrenodoxin  with  adrenodoxin  reductase  and 
CYPl  1  Al  was  shown  to  be  mainly  of  an  electrostatic  nature.  Replacement  of  acidic  amino 
acids  76  and  79,  which  have  been  proposed  to  be  involved  in  protein  interaction  on  the  basis 
of  chemical  modification  studies  (Geren  et  al.,  1984),  with  neutral  amino  acids  resulted  in 
a  considerable  decrease  in  activity  and  in  a  decreased  affinity  of  mutant  adrenodoxins  for 
their  reaction  partners  (Coghlan  and  Vickery,  1991).  This  result  is  supported  by  mutation  of 
the  conserved  lysine  residues  377  and  381  of  CYPllAl  to  either  neutral  or  negative  amino 
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acids.  The  mutants  have  been  shown  to  cause  greatly  increased  values  for  adrenodoxin 
binding,  indicating  that  these  lysines  are  the  key  sites  in  binding  of  bovine  adrenodoxin  by 
CYPl  1  Al  (Wada  and  Waterman,  1992).  There  are  only  a  few  reports  on  the  involvement  of 
other  amino  acid  residues  into  recognition  and  interaction  site  of  adrenodoxin  with  the 
reductase  and  P450s. 

Adrenodoxin  is  synthesized  in  the  cytoplasm  as  a  large  precursor  molecule.  The  58 
amino  acids  of  the  N-terminal  leader  peptide  are  processed  upon  mitochondrial  uptake  (Nabi 
et  al.,  1983,  Sagara  et  al.,  1984,  Matocha  and  Waterman,  1984).  The  primary  structure  of 
mature  adrenodoxin  was  first  elucidated  by  amino  acid  sequencing  (Tanaka  et  al.,  1973)  and 
shown  to  consist  of  1 14  residues.  Later,  a  14  amino  acid  C-terminal  extension  peptide  was 
found  in  the  nucleotide  sequence  of  adrenodoxin  cDNA  (Okamura  et  al.,  1 985),  so  the  mature 
full-length  adrenodoxin  contains  128  amino  acids.  Western  blotting  using  an  antibody 
against  a  peptide  consisting  of  C-terminal  amino  acids  1 1 5- 1 28  of  adrenodoxin  revealed  the 
presence  of  an  adrenodoxin  longer  than  114  amino  acids  in  adrenocortical  mitochondria 
(Bhasker  et  al.,  1987). 

When  isolating  adrenodoxin  from  bovine  adrenals  (Driscoll  and  Omdahl,  1986, 
Hiwatashi  et  al.,  1986,  Sagara  et  al.,  1984,  Sagara  et  al.,  1992,  Sakihama  et  al.,  1988), 
multiple  forms  of  the  protein  have  been  observed.  Proteinchemical  analysis  revealed  differ¬ 
ent  sizes  of  the  C-termini,  varying  in  length  from  114  amino  acids  (Tanaka  et  al.,  1 973),  121, 
124,  and  125  amino  acid  residues  (Hiwatashi  et  al.,  1986)  up  to  127  amino  acids  (Sakihama, 
et  al.,  1988).  However,  when  bovine  adrenodoxin  was  purified  in  the  presence  of  protease 
inhibitors,  a  protein  consisting  of  128  amino  acids  could  be  obtained  as  determined  by 
carboxypeptidase  Y  digestion  (Cupp  and  Vickery,  1989). 

In-vitro  studies  have  shown  that  adrenodoxin,  from  which  amino  acids  116-128  were 
removed  by  trypsin  cleavage,  revealed  an  identical  UV/vis  spectrum  in  comparison  to  that 
of  native  adrenodoxin  (Cupp  and  Vickery,  1989).  Furthermore,  it  has  been  demonstrated  that 
adrenodoxin  lacking  residues  116-128  exhibited  higher  biological  activity  towards 
CYPllAl  and  CYPllBl  and  higher  affinity  to  CYPllAl  as  compared  to  the  full-length 
molecule,  but  interaction  with  adrenodoxin  reductase  was  not  significantly  affected  (Cupp 
and  Vickery,  1989). 


Protein-Protein  Interactions  in  Mitochondrial  Steroid  Hydroxylase  Systems 


285 


To  systematically  study  the  role  of  the  C- terminal  region  of  adrenodoxin  and  of 
different  amino  acid  residues  in  interaction  with  the  electron  donor,  adrenodoxin  reductase, 
and  the  electron  acceptors,  CYPllAl  and  CYPllBl,  mutants  of  adrenodoxin  have  been 
prepared  by  site-directed  mutagenesis,  expressed  in  E.  coli  as  described  (Uhlmann  et  al., 
1992),  and  their  structural  and  functional  properties  have  been  characterized  in  detail. 


MATERIALS  AND  METHODS 

E.  coli  strains  HBlOl  and  BL21  were  used  as  host  strains.  Site-directed  mutagenesis 
and  synthesis  of  deletion  mutants  were  performed  using  PCR  as  described  recently  (Uhlmann 
et  aL,  1992,  Beckert  et  aL,  1994).  Mutants  of  adrenodoxin  were  expressed  in  a  high-level 
expression  system  using  the  expression  vector  pKKAdx  (Uhlmann  et  al.  1992).  Bacteria 
were  grown  and  recombinant  adrenodoxin  was  purified  as  previously  described  (Uhlmann 
et  al.,  1992).  Adrenodoxin  reductase,  CYPllAl  and  CYPllBl  were  isolated  from  bovine 
adrenals  according  to  Akhrem  et  al.  (1979).  Proteins  were  analyzed  as  described  (Uhlmann 
et  al.,  1994).  The  amino  acid  compositions  of  the  recombinant  proteins  were  determined  by 
vaporphase  hydrolysis  in  a  SYKAM  analyzer.  N-termini  were  analyzed  using  a  4 77 A  gas 
phase  sequenator  (Applied  Biosystems).  Mass  spectrometry  measurements  were  carried  out 
on  a  FINNIGAN  MAT  triplestage  quadrupole  TSQ  700  instrument.  EPR  spectroscopy  was 
carried  out  at  -IbS'^C  on  a  Varian  E3  spectrometer  using  whole  E.  coli  cells  with  expressed 
proteins  which  were  reduced  by  dithionite.  CD  spectra  were  recorded  on  a  Jasco  J720 
spectropolarimeter  at  room  temperature  in  the  ultraviolet  and  visible  region.  Fluorescence 
spectra  were  taken  on  an  RF-5001  PC  spectrofluorometer  at  room  temperature.  The  exciting 
wavelength  was  270  nm.  Redox  potentials  of  adrenodoxin  mutants  were  measured  by  the 
dye  photoreduction  method  with  Safranin  T  as  indicator  and  mediator  (Sligar  et  al.,  1979). 
Data  were  analyzed  according  to  the  Nemst  equation.  Cytochrome  c  reduction  was  assayed 
in  50  mM  potassium  phosphate  buffer,  pH  7.4,  containing  0.1  %  Tween  20  at  room 
temperature.  Reaction  mixtures  with  the  deletion  mutants  of  adrenodoxin  contained  0.05  pM 
adrenodoxin  reductase,  65  pM  horse  heart  cytochrome  c,  various  amounts  of  the  respective 
adrenodoxin  and  140  pM  NADPH.  The  mixtures  with  the  substitution  mutants  contained 
0.2 pM  adrenodoxin  reductase,  100  pM  cytochrome  c,  variable  amounts  of  recombinant 
adrenodoxin  and  100  pM  NADPH.  The  reduction  of  cytochrome  c  was  monitored  at  550 
nm.  Calculations  based  on  a  molar  extinction  coefficient  e  of  20  (mM  cm)'*  for  cytochrome  c. 

The  cholesterol  side-chain  cleavage  activity  was  measured  in  the  reconstituted  assay 
system  according  to  Sugano  et  al.  (1989).  The  incubation  mixtures  contained  0.5  pM 
CYPl  1  Al,  0.2  pM  adrenodoxin  reductase,  100  pM  cholesterol,  and  a  NADPH-regenerating 
system  for  the  substitution  mutants  of  adrenodoxin  (Beckert  et  al.,  1994).  Reaction  mixtures 
of  the  deletion  mutants  were  composed  in  the  same  way  except  that  they  contained  0.5  pM 
adrenodoxin  reductase  (Uhlmann  et  al.,  1994).  NADPH  was  added  to  start  the  reaction.  After 
a  10  minute-incubation  at  37°C,  the  reaction  was  stopped  by  boiling  for  5  minutes.  The 
steroids  were  converted  into  their  corresponding  3-one-4-ene  forms  by  adding  cholesterol 
oxidase  to  the  reaction  mixtures.  The  steroids  were  extracted  with  dichloromethane  and 
analyzed  by  reverse-phase  HPLC. 

CYPllBl  dependent  1 1  p-hydroxylation  of  deoxycorticosterone  was  performed  as 
described  (Beckert  et  al.,  1994,  Uhlmann  et  al.,  1994)  with  0.4  pM  adrenodoxin  reductase, 
0.4  pM  CYPllBl,  100  pM  deoxycorticosterone,  various  amounts  of  adrenodoxin,  and  a 
NADPH  regenerating  system.  The  reaction  was  started  by  addition  of  NADPH  and  carried 
out  for  10  minutes  at  37°C.  Dichloromethane  was  used  to  stop  the  reaction  and  to  extract 
the  steroids,  which  were  then  analyzed  by  HPLC. 
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Binding  affinity  of  adrenodoxin  to  CYPl  1 A1  was  measured  by  affinity  chromatog¬ 
raphy  on  biotin-labelled  CYPllAl  bound  to  an  avidin-sepharose  column.  10  nmol  of  the 
appropriate  adrenodoxin  sample  were  loaded  onto  the  column,  followed  by  washing  to 
remove  unbound  adrenodoxin.  A  KCl  gradient  (0  -  0.7  M;  flow  rate,  0.3  ml/min)  was  used 
to  elute  bound  adrenodoxin.  The  bound  amount  of  the  respective  adrenodoxin  mutant  was 
determined  assuming  the  binding  capacity  of  wild  type  adrenodoxin  as  100  %.  Differential 
spectral  titration  was  performed  according  to  Kido  &  Kimura  (1979).  Binding  of  cholesterol 
to  CYPllAl,  facilitated  by  the  binding  of  adrenodoxin,  causes  absorbance  changes  in  the 
Soret  region  (393  -417  nm)  of  the  cytochrome  due  to  conversion  of  CYPl  1 A1  from  its  low 
to  its  high  spin  form. 


RESULTS 

The  role  of  the  C-terminal  region  of  adrenodoxin  was  studied  by  analyzing  deletion 
mutants  4-128,  4-114,  4-108,  and  4-107,  lacking  amino  acids  1-3,  1-3  and  115-128,  1-3  and 
109-128,  or  1-3  and  108-128,  respectively.  In  addition  to  the  deletion  of  the  C-terminal 
peptides,  amino  acids  1  -3  have  been  removed  to  avoid  proteolytic  cleavage  at  the  N-terminus 
and  to  study  the  influence  of  these  residues  on  adrenodoxin  structure  and  function.  To  check 
whether  partial  proteolytic  digestion  of  the  mutants  occurs,  all  mutant  proteins  were  analyzed 
with  respect  to  amino  acid  composition,  mass  spectrometry  as  well  as  N-  and  C-  terminal 
microsequencing.  The  mutants  were  shown  to  be  of  the  expected  composition  (Table  1 ),  but 
contained  an  additional  methionine  at  the  first  position  resulting  from  an  uncleaved  start 
codon. 


Table  1.  Amino  acid  analysis  of  wild  type  adrenodoxin  and  adrenodoxin 

mutants 


Amino 

acid 

Adrenodoxin  wild 
type 

Adrenodoxin  mutants 

Y82F 

obs 

Y82S 

obs 

Y82L 

obs 

4-114 

obs 

4-108 

obs 

th* 

obs^ 

Asx 

20 

20.0 

19.8 

19.1 

18.8 

17.1 

18.6 

Thr 

10 

9.5 

9.0 

8.8 

8.4 

9.8 

10.8 

Ser 

9 

8.7 

7.4 

6.7 

5.5 

4.2 

2.4 

Glx 

13 

13.1 

12.0 

11.5 

11.0 

10.7 

11.9 

Gly 

10 

10.3 

10.3 

9.7 

9.4 

8.2 

9.0 

Ala 

7 

7.0 

7.0 

7.0 

7.0 

7.0 

5.0 

Val 

7 

7,7 

6.3 

6.3 

5.7 

6.5 

6.0 

Met 

5 

5.5 

5.4 

4.4 

4.7 

4.0 

4.6 

He 

10 

9.3 

8.8 

7.6 

7.6 

7.5 

8.3 

Leu 

12 

11.9 

12.4 

12.2 

13.4 

12.2 

13.9 

Tyr 

1 

1.2 

— 

— 

— 

1.1 

1.0 

Phe 

4 

4.0 

5.1 

4.1 

3.9 

4.1 

4.3 

His 

3 

3.1 

3.1 

3.1 

2.9 

3.1 

3.3 

Lys 

6 

5.7 

6.1 

5.8 

5.3 

4.9 

4.9 

Arg 

5 

4.7 

5.0 

5.2 

4.9 

4.0 

3.8 

Pro 

1 

n.  d. 

n.  d. 

n.  d. 

n.  d. 

n.  d. 

n.  d. 

Cys 

5 

n.  d. 

n.  d. 

n.  d. 

n.  d. 

n.  d. 

n.  d. 

Total 

128 

128 

128 

128 

112 

106 

*th  =  theoretical  number. 

^obs  =  observed  number  of  the  respective  amino  acid. 
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Table  2.  Mass  spectrometry  of  adrenodoxin  mutants 


Adx 

Theoretical  mass 
(Da) 

Observed  mass 
(Da) 

Wild  type 

14017.8 

14015.5  ±2.2 

Y82F 

14001.8 

14001.1  ±  1.6 

4-128 

13917.7 

13916.4  ±2.5 

4-114 

12338.9 

12338.0  ±  1.5 

4-108 

11780.4 

11777.7  ±  1.5 

A  highly  purified  sample  (A414/A276>0.9)  of  each 
adrenodoxin  mutant  was  desalinated  and  dissolved  in 
methanol/water  (1:1),  1  %  acetic  acid  to  a  final 
concentration  of  10  pmol/pl.  Mass  spectra  were  recorded 
on  a  FINNIGAN  MAT  triplestage  quadrupole  TSQ  700 
instrument 


No  proteolytic  digestion  has  been  observed,  even  in  the  case  of  mutant  4-128 
(Table  2)  when  freshly  purified  proteins  were  analyzed.  In  contrast,  native  adrenodoxin  was 
shown  to  undergo  proteolytic  digestion  (Driscoll  and  Omdahl,  1986). 

Deletion  of  amino  acids  1-3  did  not  lead  to  any  significant  changes  of  the  structure 
and  function  of  adrenodoxin.  The  absorption  spectra  of  all  mutants  studied  were  identical  to 
that  of  the  wild  type.  However,  EPR,  CD,  and  redox  potential  measurements  of  mutants 
4-114  and  4-108  revealed  that  the  structure  of  these  mutants  differs  from  that  of  wild  type 
adrenodoxin.  EPR  spectra  of  adrenodoxin  are  characterized  by  two  g-values:  gj^  =  1.94  and 
gil  =  2.03  (Uhlmann  et  al,  1992).  The  deletion  mutants  4-114  and  4-108  showed  signals, 
where  the  position  of  gj^  was  identical  to  that  of  native  adrenodoxin,  but  broadened,  while 
gii  was  shifted  to  a  smaller  value.  The  CD  signals  of  these  mutants  were  increased  in  all  three 
wavelenght  ranges  measured  (absorption  of  the  peptide  region,  aromatic  residues,  and 
iron-sulfur  cluster).  The  molar  ellipticity  increases  from  ±  0  (wild  type)  to  2.600 
degcm^dmol'^  (mutant  4-114)  and  5.500  degcm^dmol'*  (mutant  4-108)  at  195  nm.  In 
addition,  the  redox  potentials  of  these  mutants  were  lower  than  that  of  wild  type  adrenodoxin. 
Furthermore,  mutant  4-107,  lacking  the  single  proline  residue  contained  in  adrenodoxin, 
P 1 08,  did  not  show  EPR  signals  indicating  that  PI  08  plays  an  essential  role  for  the  assembly 
of  the  /2Fe-2S/  cluster.  Deletion  of  residues  115-128  or  109-128  did  not  essentially  affect 
the  interaction  with  the  electron  donor  adrenodoxin  reductase  as  shown  by  nearly  unchanged 
cytochrome  c  reduction  activity.  Although  this  reaction  does  not  occur  physiologically,  it  is 
a  widely  used  model  for  the  electron  transfer  from  reduced  adrenodoxin  reductase  to 
adrenodoxin  since  the  flavin-to-iron  electron  transfer  appears  to  be  the  rate-limiting  step  in 
cytochrome  c  reduction  (Lambeth  and  Kamin,  1979).  In  contrast,  interaction  with  the 
electron  acceptors,  CYPllAl  and  CYPllBl,  was  influenced. 

In  CYPllAl-dependent  cholesterol  conversion,  mutants  4-108  and  4-114  exhibited 
3-fold  and  5-fold  decreased  values  (Fig.  2A,  Table  3),  respectively,  while  the  binding 
affinity  for  CYPllAl  raised  nearly  3-fold  and  2-fold,  respectively  (Fig.  3). 

The  Vmax  values  did  not  change  upon  deletion  of  the  C-terminal  region.  When 
measuring  the  CYPllBl -dependent  conversion  of  deoxycorticosterone  to  corticosterone, 
mutants  4-108  and  4-1 14  again  showed  decreased  values  (6-fold  and  3-fold,  respectively, 
Fig.  2B,  Table  3).  In  this  reaction,  however,  also  the  V^ax  values  increased,  being  5.5  nmol 
product/min/nmol  CYPllBl  for  wild  type  adrenodoxin,  11.8  nmol  product/min/nmol 
CYPllBl  for  mutant  4-114,  and  19.7  nmol  product/min/nmol  CYPllBl  for  mutant  4-108. 
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Figure  2.  CYPllAl  and  CYPl  IBl  dependent  hydroxylation  assays  of  the  deletion  mutants  of  adrenodoxin. 
(A)  CYPllAl  dependent  cholesterol  side-chain  cleavage,  producing  pregnenolone.  (B)  CYPl  IB  1  dependent 
1 1  p-hydroxylation  of  deoxycorticosterone.  The  reactions  were  carried  out  in  a  reconstituted  system,  containing 
a  NADPH  regenerating  system,  adrenodoxin  reductase,  variable  amounts  of  adrenodoxin,  the  cytochrome  P450 
(CYPl  1  Al  or  CYPl  IB  1),  and  the  respective  substrate. 


Table  3.  Michaelis  constants  of  the  adrenodoxin  mutants 


Adrenodoxin 

mutant 

Cytochrome  c 
reduction  assay 
K,(nM) 

CYPllAl  assay 

K,n  im 

CYPl  IB  1  assay 
Kn,  (^M) 

Wild  type 

5.9  ±0.9 

1.1  ±0.1 

1.2  ±0.2 

4-128 

5.6  ±  0.5 

1.1  ±0.1 

1.3  ±0.1 

4-114 

3.4  ±  0.2 

0.2  ±  0.02 

0.4  ±  0.04 

4-108 

7.3  ±1.0 

0.4  ±  0.04 

0.2  ±  0.02 

Wild  type 

6.4  ±0.1 

1.8  ±0.05 

1.7  ±0.24 

Y82F 

4.7  ±  0.2 

1.2  ±0.04 

0.4  ±  0.03 

Y82S 

5.8  ±0.2 

2.5  ±0.07 

4.6  ±0.15 

Y82L 

6.1  ±0.2 

2.1  ±0.05 

0.7  ±  0.09 

Wild  type 

19.1  ±0.2 

2.5  ±0.07 

2.0±0.15 

H56Q 

40.2  ±  0.5 

3.2  ±0.04 

12.2  ±0.62 

H56T 

54.2  ±  0.4 

3.4  ±0.06 

8.1  ±0.25 
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Figure  3.  Binding  affinity  of  the  adrenodoxin  deletion  mutants 
to  C YP 1 1 A 1 ,  determined  by  optical  titration  according  to  Kido 
&  Kimura  (1979).  The  absorption  change  of  CYPllAl  in  the 
Soret  region  (393  -  417  nm),  which  was  induced  by  binding  of 
cholesterol  facilitated  by  binding  of  adrenodoxin,  was  moni¬ 
tored. 


wt  4-114  4-108 

O  -e-  •75r 


The  data  suggest  that  the  electron  transfer-coupled  interaction  of  adrenodoxin  with 
CYPllAl  and  CYPllBl  is  determined  at  least  in  part  by  different  features  of  the  cyto¬ 
chromes.  This  observation  is  further  supported  by  site-directed  mutagenesis  studies  of  amino 
acid  residues  Y82  and  H56. 

As  shown  in  Table  1 ,  adrenodoxin  does  not  contain  tryptophan  and  contains  only  one 
tyrosine  residue,  which  is  in  position  82.  This  unique  tyrosine  of  adrenodoxin,  which 
previously  had  been  proposed  to  be  involved  in  reductase  binding  and/or  electron  transfer 
by  chemical  modification  studies  (Taniguchi  and  Kimura,  1975,  1976)  was  replaced  by 
phenylalanine,  leucine  or  serine.  Again  the  mutants  were  tested  for  amino  acid  composition 
(Table  1),  mass  spectrometry  (Table  2),  the  structural  integrity  of  the  iron-sulfur  cluster,  the 
function  in  enzymatic  assays  (with  cytochromes  c,  CYPllAl  and  CYPllBl  as  electron 
acceptors)  and  the  binding  to  CYPllAl.  As  demonstrated  in  Table  1,  the  mutations  can  be 
clearly  identified  on  the  protein  level  when  analyzing  the  amino  acid  composition  of  wild 
type  and  mutant  proteins.  Amino  acid  composition  and  mass  spectrometry  of  mutant  Y82F 
(Tables  1,  2)  revealed  that  there  was  no  proteoloytic  digestion  of  the  mutant  protein. 
N-terminal  microsequencing  revealed  residues  GSSEDK,  corresponding  to  amino  acids  2-6 
of  native  adrenodoxin  with  a  substitution  of  the  N-terminal  serine  to  glycine,  which  was 
engineered  to  improve  the  proteolytic  stability  of  the  protein.  Unchanged  absorption,  CD 
and  EPR  spectra  as  well  as  redox  potential  measurements  indicate  that  the  environment  of 
the  /2Fe-2S/  cluster  was  not  affected  by  the  mutations  (Beckert  et  ah,  1994).  Replacement 
of  Y82  also  did  not  affect  adrenodoxin  reductase  binding  as  shown  by  unchanged  cytochrome 
c  activity.  Determination  of  the  hydroxylating  activities  of  CYPl  1 A1  and  CYPl  IBl  recon¬ 
stituted  with  adrenodoxin  mutants,  however,  indicated  marked  changes  in  the  values  up 
to  4~fold  (Table  2)  at  an  unchanged  value.  These  changes  again  differ  in  dependence 
on  the  P450  (CYPllAl  or  CYPllBl)  used,  exerting  a  more  pronounced  effect  of  tyrosine 
replacement  on  interaction  with  CYPllBl  as  compared  to  CYPllAl. 

Since  H56  has  been  supposed  to  be  proximal  to  the  domain  between  E74  and  D86, 
being  involved  in  ferredoxin  reductase  and  P450  binding  by  adrenodoxin  (Miura  and 
Ichikawa,  1991),  it  was  checked  whether  replacement  of  H56  by  other  amino  acids  would 
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Figure  4.  Intensity  of  Y82  fluorescence  of  wild  type 
adrenodoxin  and  H56  mutants.  Fluorescence  mea¬ 
surements  of  80  pM  wild  type  adrenodoxin  and  H56 
mutants  were  carried  out  in  10  mM  potassium  phos¬ 
phate  buffer,  pH  7.4  at  room  temperature  by  exciting 
at  270  nm.  The  fluorescence  intensity  of  wild  type 
adrenodoxin  is  assumed  as  100%. 


change  interaction  with  adrenodoxin  reductase,  CYPllAl  and  CYPllBl.  At  first  it  was 
investigated  whether  replacements  in  position  56  would  affect  the  fluorescence  of  Y82.  In 
fact,  it  could  be  shown  that  replacement  of  H56  by  glutamine  and  threonine  causes  changes 
in  the  intensity  of  adrenodoxin  tyrosine  fluorescence  (Fig.  4).  Thus,  H56  appears  to  be  in 
the  immediate  vicinity  of  Y82  and  therefore  the  intermolecular  interface  of  adrenodoxin  with 
its  redox  partners. 

Furthermore,  it  could  be  demonstrated  that  replacement  of  H56  by  the  indicated 
amino  acids  lead  to  changes  in  adrenodoxin  reductase  binding  as  indicated  by  an  increased 
Km  value  for  cytochrome  c  reduction,  but  also  to  changes  in  the  K,^  values  in  CYPl  1  Al  and 
CYPl  IBl-dependent  substrate  conversions  (Table  3). 


DISCUSSION 

The  importance  of  various  amino  acid  residues  as  well  as  the  N-  and  C-terminal 
regions  of  adrenodoxin  for  its  structure  and  function  as  electron  mediator  in  mitochondrial 
steroid  hydroxylases  has  been  investigated  using  site-directed  and  deletion  mutants.  The 
effect  of  the  N-  and  C-terminal  regions  on  the  interaction  of  adrenodoxin  with  various  redox 
partners,  the  role  of  the  unique  proline  residue  108  of  adrenodoxin,  which  is  conserved 
among  vertebrate  ferredoxins  (Usanov  et  ah,  1990)  for  protein  structure  and  function,  and 
the  importance  of  tyrosine  82  and  histidine  56  in  protein-protein  interaction  have  especially 
been  considered.  It  could  be  clearly  demonstrated  that  analysis  of  functional  and  structural 
properties  of  proteins  changed  by  site-directed  mutagenesis  requires  a  thorough  analysis  of 
the  desired  changes. 

The  mutants  were  characterized  by  nucleotide  sequencing  as  well  as  by  N-terminal 
microsequencing,  estimation  of  the  amino  acid  composition  and  mass  spectrometry.  As 
seen  in  Tables  1  and  2,  the  C-terminal  region  of  wild  type  adrenodoxin  and  of  the  mutants 
as  well  did  not  undergo  proteolytical  degradation  under  our  experimental  conditions. 
Since  the  deletion  mutants,  in  addition  to  the  C-terminal  deletion,  lack  the  N-terminal 
three  serines  to  avoid  partial  cleavage,  which  has  been  observed  in  previous  studies  on 
native  adrenodoxin  (Coghlan  et  al.,  1988,  Mittal  et  al.,  1988),  the  effects  of  deletion  of 
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amino  acids  1-3  and  introduction  of  a  methionine  on  the  properties  of  this  mutant  (4-128) 
were  investigated.  It  has  been  demonstrated  that  mutant  4-128  behaves  similarly  to  the 
wild  type  with  regard  to  spectral  properties,  redox  potential,  kinetics  in  cytochrome  c 
reduction,  CYPllAl  and  CYPllBl  dependent  conversion  assays  and  reduction  rates  of 
cytochromes  P450.  The  binding  affinity  of  mutant  4-128  to  CYPl  1 A1  was  also  identical 
to  the  affinity  of  wild  type  adrenodoxin. 

Considering  the  C-terminal  domain  of  adrenodoxin,  PI 08  was  revealed  to  be 
critical  for  the  formation  of  a  biologically  active  protein.  Mutant  4-108,  which  contains 
PI  08,  exhibits  unchanged  absorption  characteristics  of  adrenodoxin  and  functions  as  an 
electron  shuttle  to  cytochrome  c,  CYPllAl  and  CYPllBl,  whereas  mutant  4-107  did 
not  show  an  incorporation  of  the  /2Fe-2S/  cluster.  Thus,  it  was  concluded  that  PI 08 
promotes  a  correct  folding  of  adrenodoxin  as  a  necessary  prerequisite  for  the  assembly 
of  the  /2Fe-2S/  cluster.  When  studying  the  influence  of  deletions  on  protein-protein 
interaction,  it  was  shown  that  removal  of  residues  115-128  or  109-128  did  not  essentially 
affect  adrenodoxin  reductase  binding  as  shown  by  nearly  unchanged  cytochrome  c  re¬ 
duction  activity.  In  a  CYPllAl  assay,  mutants  4-108  and  4-114  exhibited  3.2-fold  and 
5 -fold  decreased  values,  respectively,  whilst  the  values  for  CYPllAl  decreased 
3-fold  and  1.9-fold,  respectively  (Table  3,  Fig.  3).  Additionally,  in  a  CYPllBl  assay, 
mutants  4-108  and  4-114  showed  decreased  values.  The  results  suggest  that  electron 
donation  from  adrenodoxin  to  CYPllAl  and  CYPl  IBl  is  determined  at  least  in  part  by 
different  features  of  the  cytochromes. 

Similar  differences  have  been  observed  when  analyzing  mutants  where  Y82  has  been 
replaced  by  serine,  leucine  or  phenylalanine.  While  cytochrome  c  reduction  was  not 
influenced  by  these  replacements,  the  values  changed  up  to  4-fold  when  measuring  the 
enzymatic  activities  of  mutant  adrenodoxins  with  CYPl  IB  1  and  CYPl  1B2  (Table  3).  Again, 
changes  differ  in  dependence  on  the  P450  used.  Finally,  the  values  for  some  redox 
partners  with  H56  mutants  are  increased  (Table  3).  The  value  for  mutant  H56Q  in  the 
cytochrome  c  assay  raised  twofold.  Marked  changes  in  values  were  also  found  in  the 
CYPllBl-dependent  conversion  of  deoxycorticosterone  to  corticosterone.  The  values 
for  CYPl  IB  1  mediated  by  H56Q  and  II56T  mutant  increase  6-fold  and  4-fold,  respectively 
(Table  3).  The  effect  of  replacement  in  position  56  is  less  pronounced  when  hydroxylating 
activities  of  CYPl  1 A1  were  studied.  It  can  been  seen  from  the  Table  3  that  both  H56Q  and 
H56T  mutants  exhibit  only  slightly  increased  values  when  compared  to  wild  type 
adrenodoxin.  Furthermore,  the  intensity  of  Y82  fluorescence  is  strongly  affected  by  replace¬ 
ment  of  H56. 

From  this  data  it  can  be  concluded,  that  H56  is  located  in  the  immediate  environment 
of  Y82  and  directly  or  indirectly  involved  in  binding  to  the  redox  partners  of  adrenodoxin, 
adrenodoxin  reductase  and  cytochromes  P450.  Furthermore,  the  results  obtained  support  our 
conclusion  that  although  the  major  determinant  of  adrenodoxin  for  binding  to  the  redox 
partners  is  its  highly  acidic  region,  these  proteins  (adrenodoxin  reductase,  CYPllAl,  and 
CYPl  IB  1)  have  slightly  different  binding  requirements.  This  is  not  unlikely  if  the  different 
physico-chemical  properties  of  CYPl  1 A1  and  CYPl  IB  1  are  taken  into  consideration,  such 
as  amino  acid  composition  (Watanuki  et  al.,  1978)  in  conjunction  with  the  net  charge  of  the 
proteins,  their  solubility  and  stability  (Takemori  et  al.,  1975)  as  well  as  their  immunological 
(Suhara  et  al.,  1978)  and  EPR  properties  (Kominami  et  al.,  1979).  These  different  features 
of  the  cytochromes  P450  obviously  require  differences  in  the  site  and/or  mechanism  of 
interaction  with  the  electron  donor  adrenodoxin. 

These  differences  may  also  be  important  for  allowing  the  proteins  to  discriminate 
between  the  oxidized  and  reduced  forms  of  adrenodoxin  to  promote  the  production  of 
productive  electron  transfer  complexes. 
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ABSTRACT 

DNA-protein  complexes-induced  by  potassium  chromate  in  human  leukemic  T-lym- 
phocyte  MOLT4  cells  were  isolated  by  ultracentrifugal  sedimentation  in  the  presene  of  2% 
sodium  dodecyl  sulfate  (SDS)  and  5  M  urea.  The  complexes  were  analyzed  by  two-dimen¬ 
sional  SDS-polyacrylamide  gel  electrophoresis  (PAGE).  Three  acidic  proteins  of  74, 44  and 
42  kD,  and  a  basic  protein  of  5 1  kD  were  primarily  complexed  to  DNA  following  25  pM 
chromate  treatment  indicating  selectivity  in  chromate-induced  DNA-protein  complexes. 
Higher  concentrations  of  chromate  cross-linked  many  other  proteins  to  DNA.  A  43  kD 
protein  predominantly  localized  in  the  cytoplasmic  fraction  was  found  to  be  cross-linked  to 
DNA  upon  chromate  treatment.  Partial  N-terminal  amino  acid  sequencing  of  p43  showed 
that  it  could  be  a  human  lectin.  Treatment  of  the  complexes  with  DNase  I,  RNase  and  EDTA 
revealed  that  sedimentation  of  the  proteins  was  not  due  to  formation  of  protein  aggregates, 
but  due  to  their  association  with  DNA.  The  complexes  were  disrupted,  to  some  extent,  by 
EDTA  indicating  the  involvement  of  a  chelatable  form  of  chromium  in  the  complex.  Because 
chromate-induced  DNA-protein  complexes  are  resistant  to  treatments  such  as  2%  SDS  and 
5  M  urea,  but  disrupted  under  gel  electrophoretic  conditions,  it  is  possible  that  chromium 
could  be  used  as  a  cross-linking  agent  for  identification  of  other  proteins  such  as  transcription 
factors,  that  interact  with  DNA. 
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INTRODUCTION 

A  number  of  physical  and  chemical  carcinogenic  agents  have  been  shown  to  induce 
DNA-protein  cross-linking  (Oleinick  et  ah,  1987).  Hexavalent  chromium  [Cr(VI)]  com¬ 
pounds  have  been  considered  potent  carcinogens  and  have  been  shown  to  cause  various  types 
of  DNA  damage  including  DNA-protein  cross-linking  in  various  cells  and  tissues  (see  Cohen 
et  al,  1990,  for  a  review).  Cr(VI)  does  not  bind  to  DNA  in  cell  free  systems  (Fornace  et  al., 
1981;  Koster  et  al.,  1985),  however  it  readily  enters  into  the  cell  through  the  sulfate  anion 
transport  system  (Arslan  et  al.,  1987;  Jannette  et  al.,  1985)  and  is  reduced  by  the  cells’  redox 
system  to  chromium  (III)  [Cr(III)],  which  in  turn  binds  to  DNA  in  cell  free  systems 
(Tsapakos,  et  al.,  1983).  Cr(III)  is  poorly  taken  up  into  the  cell,  possibly  explaining  why  this 
form  of  chromium  has  not  been  shown  to  be  carcinogenic  (De  Flora  et  al.,  1989). 

Although  chromate-induced  DNA-protein  complexes  are  implicated  in  chromate  car¬ 
cinogenicity,  the  mechanisms  of  their  formation,  composition,  and  biological  significance  are  not 
well  understood.  It  has  been  postulated  that  cross-linking  of  proteins  to  DNA  could  disrupt 
chromatin  structure  and  the  normal  regulation  of  gene  expression.  This,  in  turn  could  play  a  role 
in  carcinogenesis  in  that  deletion  of  DNA  bases  may  result  when  portions  of  replicating  DNA  are 
buried  under  DNA-protein  complexes  (Bedinger  et  al,  1983;  Briggs  and  Briggs,  1988).  Such 
deletions  may  also  give  rise  to  loss  or  inactivation  of  tumor  suppressor  genes  (Bouck  and 
Benjamin,  1 989).  During  normal  regulation  of  gene  expression,  protein(s)  reversibly  interact  with 
specific  DNA  sequences  (Stein  and  Kleinsmith,  1979).  Cross-linking  of  DNA  with  inappropriate 
proteins  could  disrupt  the  normal  regulation  of  DNA-protein  interactions,  an  event  that  may  be 
anticipated  to  have  serious  genetic  consequences,  including  disruption  in  or  alteration  of  gene 
expression.  Therefore,  it  is  necessary  to  determine  the  identity  of  proteins  which  are  cross-linked  to 
DNA  upon  chromate  exposure  as  well  as  the  nature  of  the  interaction  of  such  proteins  with  DNA. 
Identification  of  proteins  cross-linked  to  DNA  may  further  contribute  to  a  better  understanding  of 
chromatin,  protein  interactions,  including  the  three-dimensional  orientation  of  proteins  around  DNA. 

In  the  present  study,  we  have  analyzed  the  changes  in  the  protein  constituent  of 
chromatin  following  chromate  treatment  of  MOLT4  cells  and  have  attempted  to  identify  a 
43  kD  protein  complexed  to  DNA  upon  chromate  treatment,  by  partial  sequencing.  Inappro¬ 
priate  complexing  of  proteins  of  structural  or  functional  importance  to  DNA,  rather  than  the 
DNA-protein  complexes  themselves,  may  have  important  role  in  chromate-carcinogenicity. 
Thus,  identification  of  proteins  participating  in  chromate- induced  DNA-protein  complexes 
will  be  required  in  order  to  better  understand  the  potential  consequences  of  this  lesion. 

Chemicals 

Potassium  chromate  was  purchased  from  J.T.  Baker  (Phillipsburg,  NJ).  Acrylamide, 
N’,  N’,  N’,  N’-tetramethylenediamine  (TEMED),  ammonium  persulfate,  protein  determina¬ 
tion  kit,  coomassiee  brilliant  blue  R-250,  ampholines,  urea  and  sodium  dodecyl  sulfate  (SDS) 
were  purchased  from  Bio-Rad  Laboratories  (Richmond,  CA).  Polyvinylidene  difluoride 
(PVDF)  membrane,  ^H-thymidine,  ^^S-methionine  and  Aquassure  LSC  cocktail  were  pur- 
shased  from  New  England  Nuclear  (Boston,  MA).  DNase  free  RNase  was  purchased  from 
Boehringer  Manheim  (Indianapolis,  IN).  All  other  chemicals  and  enzymes  were  purchased 
from  Sigma  Chemical  Co.  (St.  Louis,  MO) 

Cell  Culture  and  Treatment 

Human  leukemic  T-lymphocyte  MOLT4  cells  were  purchased  from  American  Type 
Culture  Collection  (Bethesda,  MD)  and  were  maintained  in  suspension  at  exponential  growth 
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phase  in  RPMI 1640  [N-2-hydroxyethylpiperazine-N’-2-ethanesulfonic  acid  (HEPES)  modified] 
medium  supplemented  with  10%  heat-inactivated  fetal  bovine  serum,  10  U  penicillin  and  10 
pg/ml  streptomycin  solution  at  37°C  in  a  humidified  atmosphere  of  5%  CO2  and  95%  air.  Cells,  in 
exponential  growth  phase,  were  radiolabeled  with  ^H-thymidine  and  ^^S-methionine  (0.02  pCi/ml 
each)  for  ~36  hr,  in  methionine  free  RPMI  1640  medium.  Radiolabeled  cells  were  collected  by 
centrifugation,  washed  three  times  in  cold  Saline  A  (5  mM  NaHC03, 6  mM  dextrose,  5  mM  KCl 
and  140  mMNaCl,  pH  7.2)  and  resuspended  in  salts-glucose  medium  [SGM:  50  mM  HEPES,  100 
mM  NaCl,  5  mM  KCl,  2  mM  CaCl2, 5  mM  dextrose,  pH  7.2]  at  a  concentration  of  lx  10^  cells/ml. 
Potassium  chloride  (control)  or  potassium  chromate  were  added  to  the  cell  suspensions  at  different 
concentration  for  incubations  periods  of  2  hr  or  16  hr.  Following  treatment,  cells  were  collected 
and  cytotoxicity  was  determined  by  exclusion  of  trypan  blue. 

Isolation  and  Quantitation  of  DNA-Protein  Complexes 

Potassium  chromate-treated  and  control  cells  were  collected  by  centrifugation  at  500xg 
for  10  min,  after  which  cells  were  washed  three  times  in  Saline  A  and  incubated  for  1 5  min  on  ice 
in  cold  hypotonic  buffer  (10  mM  Tris-HCl,  pH  7.5  containing  10  mM  NaCl,  1.5  mM  MgCl2). 
Cells  were  collected  by  centrifugation  at  300xg  for  5  min,  and  were  lysed  in  the  above  solution 
supplemented  with  0.5%  Nonidet  P-40  and  ImM  phenylmethylsulfonyl  fluoride  (PMSF),  using  a 
loose-fitting  glass  homogenizer.  The  nuclei  were  sedimented  at  700xg  for  5  min  at  4°C,  resuspended  in 
10  mM  Tris-HCl  containing  250  mM  sucrose,  5  mM  MgCl2  and  1  mM  PMSF  (pH  7.5),  and  were 
layered  over  a  similar  solution  but  containing  880  mM  sucrose.  Nuclei  were  subsequently  collected  by 
centrifugation  for  1 0  min  at  lOOOxg  at  4°C  and  used  for  isolation  of  DNA-protein  complexes. 

The  DNA-protein  complexes  were  isolated  from  the  nuclei  of  control  or  chromate  treated 
cells  by  modification  of  a  previously  described  method  (Miller  and  Costa,  1989).  Briefly,  the 
purified  nuclei  were  solubilized  in  35  ml  of  10  mM  Tris-HCl  containing  2%  SDS,  1  mM  PMSF 
(pH  7.5)  by  shaking  on  a  platform  shaker  for  6  hr  at  room  temperature.  The  samples  were  then 
homogenized  using  a  tight-fitting  homogenizer  and  then  sedimented  at  100,000xg  for  16  hr  at 
1 8°C,  using  a  Beckman  SW  27  rotor  (Beckman  Instruments,  Fullerton,  CA).  The  pellets  were 
placed  in  a  solution  of  5  M  urea,  and  1  mM  PMSF,  shaken  at  4°C  for  6  hr,  and  homogenized  again. 
SDS  was  added  to  2%  final  concentration  and  the  DNA-protein  complexes  were  isolated  by 
ultracentrifugation  as  above.  The  pellets  were  resuspended  in  1  ml  of  10  mM  Tris-HCl  containing 
ImM  PMSF  (pH  7.5)  by  gentle  sonication  using  a  micro  probe  (Model  W  225  R,  Ultrasonics,  Inc., 
Plainview,  New  York),  and  were  precipitated  in  70%  acetone  at  -20°C.  The  different  steps  in 
isolation  of  DNA-protein  complexes  are  summarized  in  Scheme  I. 

The  DNA-protein  complexes  were  collected  by  centrifugation  at  12,500xg  for  15  min 
at  4°C  using  a  Beckman  microflige  and  were  resuspended  in  1  ml  of  1 0  mM  Tris-HCl  containing 
ImM  PMSF  (pH  7.5)  by  gentle  sonication  or  by  shaking  in  a  Nutator  for  about  16  hr  at  4°C. 
The  DNA  content  was  determined  by  measuring  the  absorbance  at  260  nm  (Maniatis  et  al., 
1982).  Both  ^H  and  ^^S  activity  were  determined  by  dissolving  the  samples  in  Aquassure 
Cocktail  (NEN,  Boston,  MA)  and  counting  in  a  Beckman  LS  5800  Liquid  Scintillation  counter 
(Beckman  Instruments,  Inc.,  Irvine,  CA).  The  protein  content  of  the  Nonidet  P-40  homogenate 
and  ^^S  specific  activity  were  determined  by  using  Bio-Rad  dye  (Bradford,  1976)  and  by 
measuring  the  ^^S  activity  in  the  acid-insoluble  material,  respectively. 

Determination  of  Stability  of  DNA-Protein  Complexes 

DNA-protein  complexes  containing  100  pg  DNA  were  taken  in  siliconized  micro¬ 
centrifuge  tubes.  MgCl2  was  added  to  5  mM  in  samples  treated  with  DNase  1  and  RNase. 
DNase  1  (200  pg/ml),  RNase  (40  pg/ml)  or  EDTA  (10-50  mM)  were  added,  mixed  and  the 
tubes  were  incubated  at  room  temperature  for  2  hr.  SDS  was  then  added  to  a  final 
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Scheme  1.  Isolation  of  DNA-protein  cross-links  by  SDS/urea  extraction  method 


concentration  of  0.5%  and  samples  were  centrifuged  at  100,000xg  for  16  hr  at  18°C.  The 
supernatants  were  carefully  removed  and  the  pellets  were  resuspended  in  10  mM  Tris-HCI 
(pH  7.5)  by  brief  sonication.  DNA  and  protein  contents  were  determined  from  the  ^H  and 
activity,  respectively,  by  liquid  scintillation  counting  as  described  above. 

Analysis  of  Proteins  by  Two-Dimensional  Gel  Electrophoresis 

Proteins  were  analyzed  by  the  nonequilibrium  method  of  two-dimensional  gel 
electrophoresis  as  described  by  O’Farrell  et  al.  (1977).  DNA-protein  complexes  containing 
150  pg  of  DNA  were  acetone  precipitated  or  lyophilized  (FTS  Systems,  Inc.,  Stone  Ridge, 
NY)  and  solubilized  in  30  pi  solubilizing  buffer  [9  M  urea,  4%  Nonidet  P-40,  2%  p-mer- 
captoethanol  and  2%  ampholines  (Bio-Rad,  pH  range  3-10  and  8-10  (4:1)].  Isoelectric 
focusing  was  carried  out  in  200  pi  capillary  tubes  (1.5  mm  diameter,  Fisher  Scientific,  NJ) 
cointaining  4%  polyacrylamide  and  2%  ampholines  (pH  range  3-10).  Cytochrome  c,  a 
colored  protein,  was  used  to  indicate  the  mobility  of  basic  proteins.  Second  dimensional 
separation  was  carried  out  on  12%  SDS-polyacrylamide  gels.  The  gels  were  subjected  to 
silver  staining  by  following  the  method  of  Sammons  et  al.  (1981). 

Cytoplasmic  (Nonidet  P-40  soluble  cytoplasmic  material)  and  nuclear  (SDS  soluble 
material  of  the  isolated  nuclei)  protein  fractions  were  saved  and  their  protein  contents  were 
determined  as  described  above.  Thirty  pg  of  protein  from  each  fraction  was  acetone  precipitated 
and  solubilized  in  20  pi  of  solubilizing  buffer.  Nonequilibrium  focusing,  separation  in  the 
second  dimention,  and  silver-staining  of  samples  were  carried  out  as  described  above. 

Electroblotting  and  Amino  Acid  Sequencing 

DNA-protein  complexes  containing  250  pg  of  DNA  were  analyzed  by  two-dimen¬ 
sional  gel  electrophoresis  as  previously  described  (Mattagajasingh  and  Misra,  1994).  The 
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second  dimensional  gel  was  pre-run  in  presence  of  1  mM  sodium  thioglycolate  to  protect 
the  N-terminal  from  reactive  compounds.  Proteins  were  electroblotted  onto  a  PVDF 
membrane  in  a  Bio-Rad  Transblot  appartaus  (Bio-Rad,  Richmond,  CA)  using  10  mM 
3-cyclohexylamino-l-propanesulfonic  acid  (CAPS)  and  10%  HPLC  grade  methanol  (pH 
1 1)  as  the  electroblotting  buffer,  at  50  V  for  1  hr  at  room  temperature.  Coomassie  brilliant 
blue  R-250  (Bio-Rad,  0.025%  in  40%  methnol)  was  used  to  visualize  the  proteins.  Acetic 
acid  was  omitted  from  the  staining  and  destaining  solution  as  it  may  cause  N-terminal 
blocking  (Bio-Rad  technical  bulletin  #  240).  The  protein  band  of  interest  was  excised 
and  automated  Edman  degradation  was  performed  using  an  Applied  Biosystems  477 A 
protein  sequencer  equipped  with  a  120  A  analyzer  (Applied  Biosystems,  Inc.,  Foster  City, 
CA). 


RESULTS 

Cytotoxicity 

Exposure  of  MOLT4  cells  to  0  — >•  200  pM  potassium  chromate  in  SGM  for  2  hr  was 
found  to  have  little  cytotoxic  effects,  as  assessed  by  trypan  blue  exclusion  (viability  was 
within  98  ±  2%  of  the  control).  The  viability  of  cells  treated  with  200  pM  chromate  for  16 
hr  in  SGM  was  decreased  to  72  ±  3%  of  control. 

Effect  of  Potassium  Chromate  on  DNA-Protein  Crosslinking  in  MOLT4 
Cells 


Cell  exposure  to  0  ^  200  pM  potassium  chromate  for  2  hr  resulted  in  a  dose-depend¬ 
ent  increase  in  the  formation  of  DNA-protein  complexes  in  MOLT4  cells.  Cells  treated  with 
200pM  chromate  for  2  hr  had  about  two-fold  increased  DNA-protein  complex  formation  as 
compared  to  the  control  (data  not  shown).  When  cells  were  treated  with  200  pM  chromate 
for  16  hr,  an  8-10  fold  increase  in  the  formation  of  DNA-protein  complexes  was  observed 
as  compared  to  the  control  cells. 

Stability  of  DNA-Protein  Complexes 

The  stability  of  DNA-protein  complexes  were  tested  by  monitoring  the  recovery  of 
DNA  and  protein  in  the  pellet  following  treatment  of  DNase  I,  RNase  and  EDTA.  The  control 
samples  (without  DNase  I  or  EDTA)  had  almost  100%  recovery  of  both  DNA  and  protein 
in  the  pellet  following  ultracentrifugation,  as  determined  by  ^H-  and  -radioactivity, 
respectively.  Treatment  of  DNA-protein  complexes,  isolated  from  both  control  and  chromate 
treated  cells,  with  DNase  I  significantly  reduced  the  recovery  of  and  in  the  pellet 
(Table  1).  RNase  treatment  of  DNA-protein  complexes  did  not  interfere  with  recovery  of 
DNA  or  protein  (Table  1).  These  data  indicate  that  chromate  treatment  induces  the  cross- 
linking  of  proteins  to  DNA  and  does  not  cause  sedimentable  protein  aggregates,  and  are 
consistent  with  the  findings  of  Miller  and  Costa  (1989)  for  chromate-induced  DNA-protein 
complexes  in  cultured  Chinese  hamster  ovary  (CHO)  cells. 

In  order  to  test  whether  chromium  is  directly  participating  in  the  DNA-protein 
complexes,  excess  amounts  of  EDTA  was  used  to  disrupt  the  complex  by  chelating  chro¬ 
mium.  To  test  whether  EDTA  chelation  of  chromium  is  complete,  0 — ^>100  mM  EDTA  was 
employed  to  disrupt  the  DNA-protein  comoplexes  isolated  from  both  the  ^^S-methionine  as 
well  as  ^^Cr-labeled  cells.  Dissociation  of  or  ^^Cr  did  not  increase  above  50  mM  EDTA 
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Table  1.  Stability  of  potassium  chromate-induced  DNA-protein 
complexes  in  intact  MOLT4  cells 


Treatment 

Chromate- induced 
DNA-protein  complexes 

Control 

DNA-protein  complexes 

DNA 

recovered 

(%) 

Protein 

recovered 

(%) 

DNA 

recovered 

(%) 

Protein 

recovered 

(%) 

Control 

100 

100 

100 

100 

DNase  I 

3.7  ±2.4 

7.4  ±3.7 

1.8±3.1 

2.3  ±  2.4 

EDTA 

98.3  ±3.5 

81.3  ±4.3 

98.8  ±4.9 

96.7  ±2.8 

RNase 

96.4  ±5.1 

98.2  ±  6.3 

96.2  ±  5.8 

97.3  ±5.7 

and  ^H-labeled  DNA-protein  complexes  were  treated  either  with  DNase  I 
(200  pg/ml),  RNase  (40  pg/ml)  or  EDTA  (50  mM)  and  incubated  for  2h  at  room 
temperature  and  sedimented  by  ultracentrifugation.  The  recovery  of  protein  and 
DNA  from  the  complexes  was  determined  by  recovery  of  and  cpm  in  the 
pellet,  respectively.  Each  value  is  a  mean  ±  SD  of  three  normalized  values. 


(data  not  shown).  As  shown  in  Table  1 ,  EDTA  treatment  of  DNA-protein  complexes  isolated 
from  control  cells  did  not  affect  the  recovery  of  or  ^^S-radioactivity  in  the  pellet.  When 
chromate-induced  DNA-protein  complexes  were  treated  with  EDTA,  recovery  of  ^^S-radio- 
activity  in  the  pellet  decreased  without  affecting  the  recovery  of  ^H-activity.  These  results 
indicate  that  the  decrease  in  activity  was  not  due  to  fragmentation  of  DNA.  The  maximum 
decrease  in  recovery  after  EDTA  (50  mM)  treatment  was  found  to  be  approximately  1 8% 
of  the  control  (Table  1), 


Figure  1.  Non  equilibrium  two-dimentional  electrophoresis  of  DNA-protein  complexes  isolated  from  nuclei 
of  control  or  chromate-treated  cells.  DNA-protein  complexes  were  isolated  from  potassium  chloride  (control) 
or  potassium  chromate  treated  cells.  (A)  Two-dimensional  gel  of  the  control  DNA-protein  complexes.  (B)  and 
(C)  Two-dimensional  gel  of  proteins  dissociated  from  DNA-protein  complexes  generated  by  treatment  of 
MOLT4  cells  with  25  and  200  pM  chromate  for  16  hr,  respectively. 
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Two-Dimensional  Gel  Electrophoretic  Analysis  of  Proteins  Complexed 
to  DNA 

The  proteins  complexed  to  DNA  in  both  the  control  (potassium  chloride)  or  potas¬ 
sium  chromate  treated  cells  were  analyzed  by  two-dimensional  gel  electrophoresis.  DNA- 
protein  complexes  were  loaded  on  the  acidic  end  of  the  gel,  in  order  to  avoid  the  entry  of 
nucleic  acids  into  the  first  dimensional  focusing  gels.  Silver-staining  of  two-dimensional 
gels  of  DNA-protein  complexes  isolated  from  control  cells  did  not  show  any  protein  in  the 
gel,  indicating  that  the  SDS/urea  method  used  for  isolation  of  DNA-protein  complexes 
effectively  dissociates  the  DNA-protein  complexes  in  the  control  cells  (Fig  1  A). 

The  proteins  complexed  to  DNA,  in  cells  exposed  to  25  pM  chromate  for  16  hr, 
but  did  not  dissociate  from  DNA  by  2%  SDS  and  5  M  urea  treatments,  are  shown  in  Fig 
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Table  2.  Molecular  weight  and  pi  of  major 
proteins  complexed  to  DNA  upon 
chromate  treatment  of  intact  MOLT4  cells 


Molecular  weight  (xlO^) 

pi 

74 

5.2-5.6 

63 

5.2-5.4 

53 

5.2 

51 

(d*) 

8.8-9.2 

44 

(b*) 

5.3 

43 

(p43) 

6.0-6.5 

42 

(c*) 

5.8 

40 

4.8-5.0 

36 

5.0-5.2 

36-38 

(CNP) 

5.5-12 

29 

6.8 

25-28 

(CNP) 

7.0-8.5 

19 

6.4-6.8 

16 

(CNP) 

5.6-6.8 

CNP:  Cluster  of  nuclear  proteins. 
*Proteins  marked  in  Figure  1  B. 


1  B.  As  shown  in  this  figure,  there  were  primarily  three  acidic  proteins  (‘a’,  ‘b’  and  ‘c’) 
and  a  basic  protein,  complexed  to  DNA  upon  chromate  treatment.  Analysis  of  the 
molecular  weight  and  pi  of  these  proteins  showed  that  the  protein  ‘a’  has  a  pi  of  5. 2-5. 6 
and  a  molecular  weight  of  74  kD,  the  protein  ‘b’  has  a  pi  of  5. 2-5.4  and  a  molecular 
weight  of  44  kD,  and  the  protein  ‘c’  has  a  pi  of  ~5.8  and  molecular  weight  of  42  kD, 
respectively.  The  protein  ‘d’  on  the  other  hand  has  a  pi  of  8. 8-9.2  and  a  molecular  weight 
of  5 1  kD.  When  cells  were  treated  with  200  pM  of  chromate  for  1 6  hr  many  other  proteins, 
in  addition  to  the  above  proteins,  were  found  to  be  complexed  to  DNA  (Fig  1  C),  indicating 
that  the  number  of  proteins  cross-linked  to  DNA  is  dependent  upon  the  dose  of  chromate. 
Molecular  weight  and  pi  of  the  major  proteins  complexed  to  DNA  upon  200  pM  chromate 
treatment  are  listed  in  Table  2. 

To  determine  the  subcellular  localization  of  the  four  proteins  primarily  complexed 
to  DNA,  cytoplasmic  and  nuclear  protein  fractions  were  analyzed  by  two-demensional  gel 
electrophoresis  (Fig  2).  Proteins  ‘b’,  ‘c’,  and  ‘d’  were  visualized  and  were  found  to 
correspond  to  proteins  in  the  cytoplasmic  fraction.  Proteins  ‘a’,  and  ‘d’  were  predominantly 
present  in  the  nuclear  fraction.  Additional  proteins  (‘m’,  ‘n’,  ‘o’,  and  ‘p’)  which  were 
predominantly  present  in  the  cytoplasmic  fraction  were  also  found  complexed  to  DNA  upon 
treatment  of  cells  with  200  pM  chromate  for  1 6  hr. 


Analysis  of  Amino  Acid  Sequence 

The  protein  ‘p’  (p43,  pi  6. 0-6. 5)  was  predominantly  detected  in  the  cytoplasmic 
fraction  but  was  found  to  be  abundantly  cross-linked  to  DNA.  Therefore,  attempts  were  made 
to  identify  this  protein  by  partial  N-terminal  sequencing  and  homology  comparision  to 
proteins  listed  in  the  Swiss  protein  DataBank.  The  N-terminal  sequencing  of  p43  revealed 
six  consecutive  amino  acids  as  listed  in  Fig  3.  This  sequence  was  found  to  have  absolute 
homology  with  amino  acid  residues  24-29  of  lectin  bra-3.  This  sequence  is  also  partially 
homologous  to  many  glycoproteins  and  the  human  multidrug-resistance  protein  1 . 
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Acidic 


Basic 


Acidic  -  Basic 

Figure  2.  Localization  of  major  proteins  crosslinking  to  DNA  upon  chromate  treatment  in  cytoplasmic  and 
nuclear  protein  fractions.  Thirty  pg  of  cytoplasmic  proteins  (A)  and  nuclear  proteins  (B)  were  analyzed  by 
nonequilibrium  two-dimensional  gel  electrophoresis  and  silver  stained. 


DISCUSSION 

In  the  present  report  we  have  shown  that  treatment  of  cells  with  0  ->  200  pM  chromate 
increases  the  association  of  proteins  with  DNA  in  a  dose-dependent  manner  without  causing 
immediate  lethality  to  the  cell.  This  indicates  that  formation  of  DNA-protein  complexes  was 
not  related  to  cell  killing  at  least  at  the  early  stages  of  chromate  interaction.  The  proteins 
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Thr 

Ser  Gyl 

Sequence  obtained:  Ala-Trp-Asn-Asp-Ala-Gin 

Arg  ,  ,  ,  ,  , 


Homologous  part  of 

Lectin  bra-3:  -Arg-Trp-Asn-Asp-Ala-GIn- 

Amlno  Acid  #:  24  -  25  -  26  -  27  -  28-29 


Figure  3.  Microsequencing  of  p43 .  Following 
two-dimensional  gel  electrophoresis,  proteins 
were  electroblotted  to  PVDF  membrane.  The 
protein  band  if  interest  was  trimmed  and  Ed- 
man  degradation  was  performed  to  in  a  Applied 
Biosystems  477  A  protein  sequencer. 


associated  with  DNA  upon  chromate  exposure  are  expected  to  be  nuclear  proteins  because, 
in  this  study,  DNA-protein  complexes  were  isolated  from  nuclei  of  chromate-exposed  cells, 
and  therefore  free  from  cytoplasmic  proteins. 

Treatment  of  DNA-protein  complexes  isolated  from  control  or  chromate-treated  cell 
nuclei  with  DNase  I  dissociated  most  of  the  proteins  associated  with  DNA  (Table  1), 
indicating  that  the  sedimentable  nature  of  the  proteins  is  due  to  the  association  of  proteins 
with  the  genomic  DNA  and  not  due  to  protein  aggregation  following  chromate  treatment. 
The  small  amount  of  DNA-protein  complexes  that  were  sedimented  after  DNase  I  digestion 
appears  to  mostly  be  in  the  form  of  stable  chromium-nucleoprotein  complexes.  This  is 
consistent  with  the  findings  of  other  investigators  who  have  demonstrated  the  resistance  of 
chromium-bound  nucleoli  to  nuclease  digestion  (Ono  et  al.,  1981),  and  the  cross-linking  of 
nuclear  matrix  proteins  to  DNA  by  heavy  metals  and  UV  irradiation  (Wedrychowski  et  al., 
1986;  Bouliakas,  1986).  The  resistance  of  the  DNA-protein  complexes  to  RNase  digestion 
indicates  that  chromate  treatment  does  not  induce  the  formation  of  RNA-protein  complexes. 
The  stability  of  the  DNA-protein  complexes  was  further  assessed  by  monitoring  the  resis¬ 
tance  of  the  complexes  to  EDTA  treatment.  Treatment  of  EDTA  dissociated  only  18%  of 
activity  from  the  DNA-protein  complexes.  Because  EDTA  effectively  chelates  Cr(III)  but 
poorly  binds  with  oxyanion  of  chromate,  EDTA-dissociable  proteins  from  DNA-protein 
complexes  could  have  been  mediated  by  a  chelatable  form  of  chromium  such  as  Cr(III). 
However,  the  majority  of  the  chromate-induced  DNA-protein  complexes  were  resistant  to 
EDTA  treatment.  These  data  suggest  that  the  predominant  form  of  chromium  in  the  DNA- 
protein  complexes  is  not  Cr(III).  It  is,  however,  plausible  that  there  may  be  some  direct 
interaction  of  proteins  with  DNA  due  to  generation  of  free  radicals  during  the  intracellular 
reduction  of  chromate. 

Intracellular  Cr(III)  is  predominately  generated  by  the  reduction  of  Cr(VI),  a  process 
shown  to  generate  oxygen  free  radicals  (Kawashini  et  al.,  1986;  Shi  and  Dalai,  1989). 
Although,  the  role  of  free  radicals  in  the  chromate-induced  DNA-protein  cross-linking  is 
uncertain  (Standeven  and  Wetterhahn,  1991),  free  radical  generating  systems  such  as 
ionizing  radiation  as  well  as  Fenton  type  reactions  have  also  been  shown  to  cause  DNA-pro¬ 
tein  cross-linking  (Chiu  et  al.,  1985;  Lesko  et  al.,  1982).  Furthermore,  we  have  shown  that 
treatment  of  cells  with  antioxidants  prior  to  chromate  treatment  inhibited  chromate-induced 
DNA-protein  cross-linking  (Mattagajasingh  and  Misra,  1993).  Collectively,  these  results 
suggest  that  free  radicals  may  be,  at  least  in  part,  involved  in  chromate-induced  DNA-protein 
cross-linking.  However,  the  present  results  suggest  that  free  radical  independent  mechanisms 
may  also  play  a  role  in  some  chromate-induced  DNA-protein  crosslinking,  because  the  the 
electrophoretic  conditions  would  not  disrupt  the  radical-induced  covalent  DNA-protein 
cross-links,  and  we  were  able  to  visualize  the  proteins  cross-linked  to  DNA  in  two-dimen¬ 
sional  gels  without  digesting  DNA.  Visualization  of  proteins  in  two-dimensional  gels  may. 
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at  least  in  part,  be  due  to  the  reduction  of  sulfhydryl  groups  of  proteins  by  2-mercaptoethanol, 
used  in  SDS-PAGE,  leading  to  disruption  of  the  complex.  Such  mechanisms  have  been 
shown  to  be  the  leading  cause  of  chromate  and  cisplatin-induced  DNA-protein  complexes 
(Wedrychowski  et  al.  (1986). 

In  order  to  be  cross-linked  to  DNA  by  any  form  of  cross-linking  agent,  a  protein 
must  reside  in  close  proximity  to  DNA  and  its  reactive  groups  should  be  oriented  such 
that  they  are  able  to  interact  with  reactive  groups  of  DNA.  Present  studies  show  that  only 
four  proteins  (‘a’,‘b’,‘c’  and  ‘d’)  were  found  to  be  primarily  complexed  to  DNA,  although 
several  other  proteins  were  seen  in  the  nuclear  protein  fraction  (Fig  2B).  Since  chromate 
was  required  for  the  crosslinking  of  proteins  with  DNA,  it  is  conceivable  that  not  only 
a  close  proximity  of  these  proteins  to  DNA  but  also  their  selective  interaction  with 
chromium  could  be  important  factors  necessary  for  the  crosslinking  of  these  proteins  to 
DNA.  Other  investigators  have  reported  the  association  of  a  45  kD  protein  (similar  in 
mol.  wt.  and  pi  to  protein  ‘b’  we  have  detected)  to  DNA  by  chromium  (Wedrychowski 
et  al.,  1985;  1986)  and  ionizing  radiation  (Chiu  et  al.,  1985).  This  protein  has  been 
identified  as  nuclear  actin  (Miller  and  costa,  1989)  in  CHO  cells  exposed  to  chromate. 
The  identity  of  proteins  a,  c,  and  d  remains  to  be  ascertained.  Although  histones  constitute 
a  substantial  part  of  the  chromatin,  these  basic  proteins  were  not  complexed  to  DNA 
upon  chromate  treatment.  Similar  results  were  reported  by  Miller  and  Costa  (1989). 
Because  Cr(III)  has  high  affinity  for  sulfur-containing  ligands,  and  there  is  scarcity  of 
cystiene  residues  among  histones,  it  appears  plausible  that  histones  may  not  complex  to 
DNA  by  chromate  due  to  unavailability  of  appropriate  ligands. 

In  the  present  study,  homology  of  p43  microsequence  with  amino  acids  24-29  of 
lectin  bra-3  indicates  that  it  could  be  a  human  lectin.  Lectins  have  not  been  previously  shown 
as  DNA-binding  proteins.  It  has  been  shown  that  lectins  are  located  in  a  wide  variety  of  cells 
and  cell  membranes.  Although  lectin  receptors  have  been  found  on  the  cytoplasmic  surface 
of  intracellular  membranes  such  as  the  nuclear  envelope  and  mitochondrial  outer  membrane, 
recent  evidences  indicate  that  lectin  binding  takes  place  on  the  noncytoplasmic  surface  of 
these  organelles  (Lis  and  Sharon,  1986  a).  Alteration  in  lectin  levels  upon  malignant 
transformation  (Gabius  et  al.,  1986)  and  their  involvement  in  developmental  processes 
(Kolb-Bochofen,  1986)  has  been  reported.  Lectins  have  also  been  shown  to  function  as 
receptors  (Kolb-Bochofen,  1986)  and  mitogenic  regulators  (Lis  and  Sharon,  1986  b). 
Because  chromate-induced  DNA-protein  complexes  predominantly  occur  in  transcription¬ 
ally  active  DNA  (Hamilton  and  Wetterhahn,  1989),  it  is  tempting  to  speculate  that  lectins 
may  be  involved  in  the  transcription  process.  Nonetheless,  the  cross-linking  of  lectins  to 
DNA  could  lead  to  serious  physiological  and  genetic  consequences. 

In  summary,  our  results  indicate  that  chromate  treatment  modifies  the  chromatin 
structure  through  complex  formation  with  a  selected  group  of  non-histone  proteins.  Lectin 
has  been  suspected  as  one  of  the  proteins  involved  in  chromate-induced  DNA-protein 
complexes.  The  exact  nature  of  the  interaction  between  the  DNA  and  protein  remains 
obscure.  However,  our  results  suggest  both  the  participation  of  a  chelatable  form  of 
chromium  such  as  Cr(III)  as  well  as  the  involvement  of  oxidative  mechanisms  in  the 
process  of  chromate-induced  DNA-protein  cross-linking.  Although  chromate-induced 
DNA-protein  complexes  are  found  to  be  resistant  to  treatments  such  as  2%  SDS  and  5 
M  urea,  their  dissociation  in  gel  electrophoretic  conditions  indicates  their  association  in 
the  form  of  non-covalent  interactions.  These  characteristics  of  chromate-induced  DNA- 
protein  complexes  suggest  that  it  is  possible  to  use  chromium  in  studies  involving 
chromatin  structure  as  well  as  indentification  of  proteins  participating  in  DNA-protein 
interactions,  specifically  those  that  undergo  transient  interaction  with  DNA,  such  as 
transcription  factors. 
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INTRODUCTION 

The  nicotinic  acetylcholine  receptor,  (AChR)  is  a  membrane  protein  on  the  postsy- 
naptic  neuromuscular  junction.  It  has  a  principal  role  in  postsynaptic  neuromuscular  trans¬ 
mission  because  it  mediates  ion  flux  across  the  membrane  (1-2).  The  receptor  is  a  pentamer 
composed  of  four  subunits  a2py5.  Functional  studies  have  focused  mostly  on  the  a-subunit 
because  it  is  responsible  for  binding  acetycholine  (3-5)  and  a-neurotoxms  (6).  Snake  venom 
postsynaptic  neurotoxins  form  a  large  family  of  related  proteins  of  which  two  subgroups, 
the  long  and  short  neurotoxins,  are  major  constituents.  Both  long  and  short  neurotoxins  are 
known  to  bind  specifically  to  the  a-chain  of  AChR  in  a  competitive  manner  with  cholinergic 
ligands  (7-8),  but  display  differences  in  their  association  and  dissociation  kinetics. 

The  primary  structures  of  the  four  AChR  subunits  of  Torpedo  californica  (0  (9-12) 
and  mouse  (m)  (13-16)  and  the  a-subunits  of  human  Qi)  and  bovine  (17)  have  been  deduced 
from  the  respective  cDNA  or  mRNA  sequences.  From  the  primary  structure  of  each  AChR 
subunit,  it  was  possible  to  identify  transmembrane  hydrophobic  regions  and  the  extracellular 
part  of  the  chain  (11,18,19).  We  carried  out  immunological  and  toxin-binding  studies  on 
inter- transmembrane  synthetic  peptides  which  confirmed  (20)  the  model  postulating  five 
transmembrane  regions  (18,19).  These  investigations  afforded  an  outline  for  the  transmem¬ 
brane  organization  of  the  AChR  subunits  and  a  working  3-D  model  for  the  a-neurotoxin  and 
the  AChR  binding  cavity  on  AChR. 

In  recent  work,  applying  a  comprehensive  synthetic  strategy  previously  introduced 
and  developed  by  this  laboratory  (21,22),  we  mapped  (23)  the  extracellular  surface  of 
the  a-chain  of  /AChR  for  regions  that  are  accessible  to  binding  with  antibodies  against 
a  panel  of  synthetic  overlapping  peptides  which  encompassed  the  entire  extracellular 
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parts  of  the  chain.  The  binding  of  the  anti-peptide  antibodies  to  membrane-bound  /AChR 
and  to  isolated,  soluble  rAChR  was  determined.  These  binding  studies,  which  enabled  a 
comparison  of  the  accessible  regions  in  membrane-bound  AChR  and  free  AChR,  revealed 
that  the  receptor  undergoes  considerable  changes  in  conformation  upon  removal  from 
the  cell  membrane  (23).  Also,  we  determined  the  structural  organization  of  the  main 
extracellular  domain  of  mAChR  a-subunit  on  live  mouse  muscle  cells  in  culture  (24).  A 
comparison  of  this  binding  profile  and  the  profile  obtained  with  membrane-bound  ^AChR 
in  isolated  membrane  fractions  showed  some  similarities  as  well  as  significant  differences 
between  the  subunit  organization  in  the  isolated  membrane  fraction  and  in  the  membrane 
on  live  muscle  cells.  The  exposed  regions  defined  by  this  study  (24)  may  be  the  primary 
targets  for  the  initial  autoimmune  attack  on  the  receptors  in  experimental  autoimmune 
myasthenia  gravis. 

Recent  studies  from  this  laboratory  mapped  the  full  profile  of  binding  regions 
for  long  a-neurotoxins  [a-bungarotoxin  (BTX)  and  cobratoxin  (Cbt)]  on  the  a-subunits 
of  the  ^AChR  (25-27)  and  /lAChR  (28)  as  well  as  the  binding  regions  for  short 
neurotoxins  [erabutoxin  (Eb)  and  cobrotoxin  (Cot)]  on  the  a-subunits  of  /AChR  and 
/lAChR  (29).  Conversely,  the  binding  regions  for  AChR  on  BTX  were  mapped  by 
synthetic  peptides  representing  each  of  the  toxin  loops  (30,31).  Identification  of  the 
binding  regions  on  AChR  for  short  and  long  neurotoxins  has  provided  a  molecular 
explanation  for  the  observed  differences  between  the  two  toxin  groups  in  their  actions 
on  AChR  (29). 

Interaction  of  acetylcholine  receptor  with  acetylcholine  and 
a-neurotoxins 

Localization  and  synthesis  of  the  acetylcholine-binding  site 

On  the  basis  of  sequence  analysis  and  structural  topology  of  the  a-subunit,  it  has 
been  proposed  that  the  invariant  cysteine  residues  128  and  142  form  a  disulfide  bridge,  the 
integrity  of  which  is  essential  for  the  binding  of  ACh  to  the  receptor  (1,9-1 1,32).  We  have 
localized,  by  peptide  synthesis,  the  acetylcholine-binding  site  in  both  human  and  Torpedo 
receptors  (5).  A  peptide  containing  this  loop  region  (residues  al25-147)  was  synthesized 
and  solid-phase  radiometric  binding  assays  demonstrated  that  it  had  a  high  binding  of 
'^^I-labeled  BTX  (5,  25).  It  was  further  shown  that  the  free  peptide  bound  well  to  [^H] 
acetylcholine  (5).  Pretreatment  of  peptide  a- 125- 147  with  2-mercaptoethanol  destroyed  its 
binding  activity,  clearly  showing  that  the  integrity  of  the  disulfide  bonded  loop  structure  was 
essential  for  binding.  Unlabeled  ACh  also  inhibited  the  binding  of  labeled  ACh  to  the 
synthetic  peptide.  The  region  al25-147,  therefore,  contains  essential  elements  of  the 
ACh-binding  site  of  AChR  (5).  It  is  not  surprising,  therefore,  that  immune  responses  to  this 
peptide  are  involved  in  the  pathogenesis  of  experimental  autoimmune  myasthenia  gravis 
(33,34),  It  has  been  noted  (5),  however,  that  the  results  do  not  preclude  the  possibility  that 
additional  residues,  residing  outside  the  region  a  125- 147,  are  involved  in  the  binding  of 
acetylcholine  to  AChR. 

The  a-Neurotoxin  Binding  Regions  on  Human  and  Torpedo  AChR 

A  comprehensive  synthetic-peptide  strategy  we  had  originated  (21,22)  was  applied 
to  /AChR  and  /zAChR  (Figure  1)  and  enabled  us  to  map  the  full  profile  of  the  continuous 
binding  regions  for  long  and  short  a-neurotoxins  on  the  extracellular  part  (residues 
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Figure  1.  Covalent  structures  of  the  synthetic  overlapping  peptides  representing  the  extracellular  part  of  each 
of  the  a-chains  of  human  and  Torpedo  californica  AChRs.  The  upper  sequences  of  each  pair  of  peptides  give 
the  full  primary  structures  of  the  human  AChR  peptides  and,  under  these,  only  the  residues  that  are  different 
in  the  corresponding  Torpedo  peptides  are  given.  Segments  in  bold  type  represent  the  five-residue  overlaps 
between  consecutive  peptides. 
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Figure  2.  Summary  of  the  binding  profiles  of  (a)  BTX  and  (b)  cobratoxin  to  the  synthetic  overlapping  peptides 
of  ?AChR.  The  bars  represent  the  maximum  binding  values  to  25)il  of  a  1:1  (v/v)  suspension  of  each  peptide 
adsorbent.  (From  Mulac-Jericevic  and  Atassi,  refs.  26,  27.) 


a  1-2 10)  of  the  a-chains  of  /AChR  and  AAChR.  In  ^AChR,  the  binding  regions  for  long 
neurotoxins  were  found  (25-27)  to  reside  within  (but  may  not  include  all  of)  residues 
/al-10,  /a32-49,  /alOO-1 15,  /al22-138  and  /al82-198  (Figure  2).  In  human  AChR,  long 
neurotoxins  bind  to  regions  /za32-49,  /^alOO-115,  /ral22-138  and  /zal94-210  (28,35). 
For  short-neurotoxin  binding  on  the  a-chain  of  ^AChR,  five  Cot-binding  regions  (Fig¬ 
ure  3)  were  found  to  reside  within  peptides  ral-16,  ra23-38//a34-49  overlap,  ^alOO-1 15, 
/a  122- 138  and  /a  194-2 10.  The  Eb-binding  regions  were  localized  (Figure  3)  within 
peptides  /a23-38//a34-49//a45-60  overlap,  ^alOO-115  and  /a  122- 138.  The  main  binding 
activity  for  both  toxins  resided  within  region  /al22-138.  The  binding  of  long  a-neuro- 
toxins  [BTX  and  Cbt]  involved  the  same  regions  of  /AChR  as  well  as  an  additional  region 
within  the  residues  a  182- 198.  Thus,  region  a  182- 198,  which  is  the  strongest  binding 
region  for  long  neurotoxins  on  /AChR,  was  not  a  binding  region  for  short  neurotoxins 
(29).  On  /?AChR,  peptide  /za  122- 138  possessed  the  highest  activity  with  both  toxins 
(Figure  4),  and  lower  activity  was  found  in  the  overlap  /?a23-38//?a34-49//?a45-60  and 
in  peptide  /la  194-2 10.  In  addition,  peptides  /?al00-115  and  ha56-l\  showed  strong  and 
medium  binding  activities  to  Eb,  but  low  activity  to  Cot,  whereas  peptide  ha\-\6  exhibited 
low  binding  to  Cot  and  no  binding  to  Eb.  Comparison  with  the  aforementioned  studies 
(28,35)  indicated  that,  for  /^AChR,  the  binding  regions  of  short  and  long  neurotoxins 
were  essentially  the  same  (29).  The  finding  that  the  region  within  residues  al 22-1 38  of 
both  human  and  Torpedo  AChR  possessed  the  highest  binding  activity  with  short  neuro¬ 
toxins  indicated  that  this  region  constitutes  a  universal  binding  region  for  long  and  short 
neurotoxins  on  AChR  from  various  species  (29). 
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BINDING  OF  COT  AND  EB  TO  TORPEDO  ACHR  PEPTIDES 


Peptide  Number 

Figure  3.  Summary  of  the  profiles  of  (a)  cobrotoxin  (Cot)  and  (b)  erabutoxin  (Eb)  binding  to  the  synthetic 
overlapping  peptides  of  the  extracellular  part  of  the  a-chain  of  rAChR.  The  peptide  numbers  refer  to  the 
sequences  given  in  Fig.  1.  The  binding  values  of  labeled  Cot  and  Eb  to  rAChR  were  55140  ±  1350  and 
68550  ±  1520  cpm,  respectively.  Binding  to  unrelated  proteins  (bovine  serum  albumin,  horse  myoglobin)  and 
peptides  ((sperm-whale  myoglobin  synthetic  peptides  1-17,  25-41  and  121-137  (ref  41))  (negative  controls) 
was  650  ±  220  cpm.  (From  Ruan  et  al.,  ref.  29.) 


Mapping  of  the  Acetylcholine  Receptor-Binding  Sites  on  a-Bungarotoxin 

The  amino  acid  sequences  of  numerous  snake  venom  toxins  have  been  determined 
(36).  These  sequences  tend  to  fall  under  three  classification:  short  neurotoxins,  long  neuro¬ 
toxins,  and  cytotoxins  (reviewed  in  ref.  36).  Pharmacologically  active  peptides,  with  effects 
ranging  from  those  of  the  neurotoxins  (i.e.,  muscle  paralysis  and  respiratory  failure)  to  those 
of  the  cytotoxins  (i.e.,  hemolysis,  cytolysis,  cardiotoxic  effects,  and  muscle  depolarization), 
can  be  designed  and  synthesized  based  on  the  structure  of  short  neurotoxins,  long  neurotox¬ 
ins,  and  cytotoxins  (37).  Therefore,  we  adopted  a  synthetic  approach  to  dissect  the  activities 
of  these  toxins.  The  approach  has,  thus  far,  been  applied  to  localize  the  distinct  AChR-binding 
regions  on  BTX.  The  entire  toxin  molecule  was  essentially  subdivided  into  unique,  poten¬ 
tially  active  regions,  and  the  peptides  were  designed  to  mimic  as  closely  as  possible  the 
native  regional  structure  (3 1 ). 


316 


M.  Z.  Atassi  and  B.  Z.  Dolimbek 


BINDING  OF  COT  AND  EB  TO  HUMAN  ACHR  PEPTIDES 


1  2  3  4  5  6  7  8  9  10  1 1  12  13  14  15  16  17  18 


Peptide  Number 

Figure  4.  Summary  of  the  binding  profiles  of  Cot  and  Eb  to  the  synthetic  overlapping  peptides  of  the 
extracellular  part  of  the  a-chain  of  human  AChR.  The  peptide  numbers  refer  to  the  sequences  given  in  Fig.  1 . 
(From  Ruan  et  ah,  ref.  29.) 


Accordingly  the  following  panel  of  BTX  peptides  were  constructed  (31)  (Figure  5) 

Loop  1  Peptide  (LI ).  residues  3-16,  with  an  artificial  disulfide  between  two  terminal 
cysteines. 

NH2-Terminal  Extension  of  the  Loop  1  Peptide  (Ll/N-Tail).  residues  1-16,  con¬ 
structed  as  in  LI  but  further  extended  with  the  hydrophobic  potentially  interactive  NH2-ter- 
minal  residues  1  and  2. 

Loop  2  Peptide  (L2).  residues  26-41 ,  with  cysteine  substitutions  at  both  ends  of  the 
peptide  providing  an  artificial  intramolecular  disulfide  linkage  between  these  two  residues; 
alanine  and  threonine  substitutions  at  residues  29  and  33,  respectively,  to  eliminate  the 
disulfide  of  BTX  loop  5  and,  thereby,  avoid  the  formation  of  disulfide-linked  polymers  (30). 

Loop  3  Peptide  Corresponding  to  Loop  3  (L3).  residues  48-59,  clasped  at  naturally- 
occurring  cysteines  48  and  59  of  BTX  on  the  terminals  of  the  peptide. 
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A.  Covalent  structure  of  the  synthetic  BgTX  peptides 
3( - - - - - 


LI 


I  16 

C-H-T-T-A-T-I-P-S-S-A-V-T~C- (G) 

3(  I  16 

Ll/N-tail  I-V-C-H-T-T-A-T-I-P-S-S-A-V-T-C- (G) 


L2 

L2  (G) 

L3 


|26  28  41) 
C“K-M-W-A“D-A“F-T-S-S-R“G-K-V-V-E-C‘ 


1 26  4r| 

C-K-M-O-A-D-A-F-T-S-S-R-G-K-V-V-E-C-G 


48f 


59 


C-P-S-K-K-P-Y-E-E-V-T-C- (G) 

45  |59 

L3/Ext  A-A-T-C-P-S-K-K-P-y-E-E-V-T-C- (G) 


L4/C-tail 

C-tail 


60|  I  66  74 

C-S-T-D-K-C-N-H-P-P-K-R-Q-P-G 

66  74 

N-H-P-P-K-R-Q-P-G 


B. 


Covalent  structure  of  the  randomized  secjuence  analogs  of  the  BgTX 
peptides. 

I - 1 

R.L1-N“tail  T.H.C.I.T.V.A.S.T.P.I.T.S.V.A.C.G. 


R.L2 


r 


C.W.V.R.D.T.A.M.F.K.G.A.K.S.E.V.S.C.G. 


R.L3/Ext 


K.S.P.C.A.Y.K.E.P.E.T.T.V.A.C.G. 


Figure  5.  Structure  of  the  synthetic  peptides  representing:  (a)  the  loops  and  exposed  regions  of  BIX;  and  (b) 
three  peptide  analogs  which  had  the  same  amino  acid  composition  as  the  respective  peptides  Li/N-tail,  L2  and 
L3/Ext,  but  whose  sequences  were  randomized.  (Figure  is  from  Atassi  et  al,  ref  39.) 


Loop  3  Extended  toward  the  NH2  terminal  by  the  Three  Residues  (L3/Ext).  Ala-45, 
Ala-46,  and  Thr-47  added  at  the  N-terminal  of  loop  3  (i.e.  residues  45-59). 

An  Extension  of  the  COOH-Terminated  Tail  (L4/C-Tail).  residues  60-74  which  in¬ 
cluded  the  fourth  loop  of  BTX  between  Cys-60  and  Cys-65. 

A  COOH-Terminal  Linear  Peptide  (C-Tail).  residues  66-74. 

In  all  experiments,  the  peptides  were  purified  and,  when  appropriate,  the  monomeric 
cyclic  structures  were  prepared. 

The  ability  of  these  peptides  to  bind  ^AChR  was  studied  (3 1 )  by  radiometric  absorbent 
titrations.  Three  regions,  represented  by  peptides  1-16,  26-41,  and  45-59,  were  able  to  bind 
*^^I-labeled  ^AChR  and,  conversely,  *^^I-labeled  peptides  were  bound  by  /AChR.  In  these 
regions,  residues  He- 1,  Val-2,  Trp-28,  Lys-26  and/or  Lys-38,  and  one  or  all  of  the  three 
residues  Ala-45,  Ala-46,  and  Thr-47,  are  essential  contact  residues  in  the  binding  BTX  to 
receptor.  Other  synthetic  regions  of  BTX  showed  little  or  no  /AChR-binding  activity.  The 
specificity  of  /AChR  binding  to  peptides  1-16, 26-4 1 ,  and  45-49  was  confirmed  by  inhibition 
with  unlabeled  BTX. 
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Other  parts  of  the  BTX  molecule  make  little  or  no  contribution  to  its  binding 
to  AChR.  The  region  within  peptide  L2  makes  a  higher  contribution  to  the  binding 
activity  of  BTX  than  do  the  regions  within  the  peptides  Ll/N-tail  and  L3/Ext  The 
radioiodination  of  peptide  Ll/N-tail  (most  likely  at  His-4)  and  L3/Ext  (most  likely  at 
Tyr-54),  appear  to  have  some  adverse  effects  on  the  binding  of  the  respective  peptide 
to  ^AChR.  Thus,  peptide  Ll/N-tail  and  L3/Ext  exhibited  lower  affinity  than  peptide 
L2  when  the  binding  of  the  labeled  peptides  to  ^AChR  adsorbent  was  inhibited  by 
unlabeled  BTX  (IC50  values:  L2,  8.4  x  lO'^M;  Ll/N-tail,  8.2  x  L3/Ext,  4.4 

xlO'^M).  By  contrast,  the  three  peptides  had  comparable  affinities  (IC50  values:  L2,1.5 
X  Ll/N-tail,  4.2  x  10‘^M;  L3/Ext  5.1  x  10'”^M)  when  binding  of  *^^I-labeled 

/AChR  to  peptide  adsorbents  was  inhibited  by  unlabeled  BTX,  giving  almost  super- 
imposable  inhibition  curves  (31). 

It  was  concluded  (3 1 )  that  BTX  has  three  main  AChR-binding  regions  (loop  1  with 
NH2-terminal  extension,  loop  2,  and  loop  3  extended  toward  the  NH2-terminal  by  residues 
45-47). 


The  a-Neurotoxin  Binding  Cavity  of  Human  AChR 

We  have  recently  described  an  approach  for  studying  the  details  of  protein- 
protein  recognition  (35).  Each  of  the  active  peptides  of  one  protein  is  allowed  to 
interact  with  each  of  the  active  peptides  of  the  other  protein.  Based  on  the  relative 
binding  affinities  of  peptide-peptide  interactions,  the  disposition  of  two  protein  mole¬ 
cules  in  a  complex  can  be  described  if  the  3-D  structure  of  one  of  the  two  molecules 
is  known.  The  peptides  of  the  binding  site  of  one  protein  (whose  3-D  structure  is  not 
known)  are  docked  onto  the  appropriate  regions  of  the  other  whose  3-D  structure  is 
known,  by  computer  graphics  and  energy  minimization  thus  allowing  a  3-D  model  to 
be  constructed  of  the  unknown  binding-site  cavity.  The  validity  of  this  approach  was 
first  established  with  peptides  corresponding  to  regions  on  the  P  chain  of  human 
hemoglobin  involved  in  binding  to  the  a  chain  (35).  As  mentioned  above,  the  regions 
on  /lAChR  and  /AChR  which  bind  BTX  have  been  localized.  Also,  the  binding  regions 
for  /AChR  on  BTX  were  mapped  by  synthetic  peptides  representing  each  of  the  BTX 
loops  (30,31).  In  recent  work,  (35),  peptides  representing  the  active  regions  of  one 
molecule  were  allowed  to  bind  to  each  of  the  active-region  peptides  of  the  other 
molecule.  Thus,  the  interaction  of  three-BTX  synthetic  loop  peptides  with  four  synthetic 
peptides  representing  the  toxin-binding  regions  on  /i  AChR  permitted  the  determination 
of  the  region-region  interactions  between  BTX  and  the  human  receptor.  Based  on  the 
known  3-D  structure  of  BTX  (38),  the  active  peptides  of  /lAChR  were  then  assembled 
to  their  appropriate  toxin-contact  regions  by  computer  model  building  and  energy 
minimization.  This  allowed  the  three-dimensional  construction  of  the  toxin-binding 
cavity  on  /?AChR  (Figure  6).  The  cavity  appears  to  be  conical,  30.5  A  in  depth,  involving 
several  AChR  regions  that  make  contact  with  the  BTX  loop  regions.  One  AChR  region 
(within  residues  a  125- 136)  involved  in  the  binding  to  BTX  also  resides  in  the  afore¬ 
mentioned  ACh  binding  site  (5).  This  demonstrates  in  three  dimensions  a  critical  site 
involved  in  both  ACh  activation  and  BTX  blocking.  Thus,  studying  the  interaction 
between  peptides  representing  the  binding  regions  of  two  protein  molecules  may  provide 
an  approach  in  molecular  recognition  by  which  the  binding  site  on  one  protein  can 
be  described  if  the  3-D  structure  of  the  other  protein  is  known  (35) 
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Figure  6.  A  stereo  drawing  of  a  3-D  construction  of  the  toxin-binding  eavity  in  AChR  with  the  BTX  molecule 
(backbone  only)  bound  in  the  cavity  (Upper)  and  without  the  BTX  molecule  (Lower).  The  somewhat  conical 
cavity  has  the  following  dimensions:  residues  100-136,  2 1.3 A;  residues  136-32,  35.0A;  residues  32-198, 
16.06A;  and  residues  198-100,  22.13A.  The  depth  of  the  cavity  is  30.48A.  (From  Ruan  et  al.,  ref.  35.) 


Antibody  and  T-cell  Recognition  of  a-Bungarotoxin  and  Its  Synthetic 
Peptides 

The  aforementioned  peptides  representing  the  loops  and  surface  regions  of  BTX,  as 
well  as  control  peptide  analogs  in  which  these  sequences  were  randomized  were  used  to  map 
the  recognition  profiles  of  the  antibodies  and  T-cells  obtained  after  BTX  immunization  (39). 
Also,  the  abilities  of  anti-peptide  antibodies  and  T-cells  to  recognize  the  immunizing  peptide 
and  BTX  were  determined  (39). 

Responses  to  Immunization  with  BTX 

Three  regions  of  BTX  were  immunodominant  by  both  rabbit  and  mouse  anti-BTX 
antibodies  (Table  1).  These  regions  resided  within  loops  LI  (residues  3-16),  L2  (residues 
26-41)  and  the  C-terminal  tail  (residues  66-74)  of  the  toxin.  The  regions  recognized  by 
BTX-primed  T-lymphocytes  were  mapped  in  five  mouse  strains:  C57BL/6  (H-2*),  Balb/c 
(H-2^),  CBA(H-2^),  C3H/He  (H-2^)  and  SJL  (H-20-  The  H-2*and  H-2^haplotypes  were  high 


320 


M,  Z.  Atassi  and  B.  Z.  Dolimbek 


Table  1.  Binding  of  anti-BTX  antibodies  to  synthetic  BTX  peptides. 


I2f 

’I-labeied  antibodies  bound  (cpm)'^ 

Peptides 

Mouse  antiserum 
#233 

Mouse  antiserum 
#253 

Mouse  antiserum 
#236 

Rabbits 

antiserum 

BTX 

36,480 

34,746 

38,015 

44,120 

LI 

14,675 

15,220 

18,715 

16,768 

Ll/N-tail 

13,040 

14,706 

16,420 

14,487 

L2 

5,204 

6,500 

8,590 

10,952 

L2(G) 

1,879 

2,361 

3,340 

4,953 

L3 

2,419 

3,380 

2,570 

3,682 

L3/Ext 

1,796 

1,830 

2,340 

4,952 

L4/C-tail 

5,105 

7,256 

7,374 

9,122 

C-tail 

16,481 

11,588 

12,256 

17,468 

Controls 

R.L2 

875 

1,092 

1,324 

1,108 

R.L3/Ext 

962 

867 

1,011 

1,487 

Nonsense  peptide 

103 

683 

764 

684 

Bovine  serum  albumin 

895 

1,121 

1,157 

985 

Myoglobin 

725 

985 

1,270 

nd 

’^Results  were  obtained  by  radioimmunoadsorbent  titrations  and  represent  the  average  plateau  values 
of  three  replicate  analyses  which  varied  ±  1.3%  or  less.  Values  have  not  been  corrected  for 
non-specific  binding,  but  values  of  negative  controls  are  shown.  (Table  is  from  ref  39.) 


responders  to  BTX,  while  the  H-2^  and  H-2'*  were  intermediate  responders.  The  T-cell 
recognition  profile  of  the  peptides  varied  with  the  haplotype  (Figure  7),  consistent  with 
Ir-gene  control  of  the  responses  to  the  individual  regions.  The  submolecular  specificities  of 
antibodies  and  T-cells  were  compared  in  three  of  the  mouse  strains  (C57BL/6,  Balb/c  and 
SJL)  (Table  2).  In  a  given  mouse  strain,  there  were  regions  that  were  strongly  recognized  by 
both  antibodies  and  T-cells  as  well  as  regions  that  wee  predominantly  recognized  either  by 
antibodies  or  by  T-cells. 


Table  2.  Comparison  of  the  specificities  of  antibody  and  T-cell 
responses  against  BTX  in  three  independent  mouse  haplotypes 


BTX  or 
peptide 

Response  levels  following  BgTX  immunization* 

C57BL/6  (H-2'') 
Antibody  T-cell 

Balb/c  (H-2^) 
Antibody  T-cell 

SJL  (H-2') 
Antibody  T-cell 

BTX 

5+ 

4+ 

5+ 

5+ 

5+ 

3+ 

LI 

3+ 

2+ 

4+ 

2+ 

4+ 

2+ 

Ll/N-tail 

3+ 

3+ 

4+ 

3+ 

3+ 

2+ 

L2 

2+ 

2+ 

2+ 

4+ 

2+ 

2+ 

L3 

1+ 

2+ 

2+ 

3+ 

2+ 

L3/Ext 

1  + 

3+ 

2+ 

3+ 

2+ 

2+ 

L4/C-tail 

2+ 

3+ 

3+ 

4+ 

2+ 

2+ 

C-tail 

3+ 

3+ 

4+ 

4+ 

4+ 

2+ 

*The  number  of  +  signs  denote  the  following  levels  of  antibody  and  T-cell  responses 
(in  net  cpm  values):  1+,  3000-5000,  2+,  5,100-10,000;  3+,  10,000-20,000;  4+, 
20,100-30,000;  5+  >  30,000.  (Table  is  from  ref  39.) 


Challenge  Antigen 


Figure  7.  A  bar  diagram  showing  the  maximum  proliferative  responses  (at  optimum  doses)  in  vitro  to  BTX, 
its  peptides,  or  negative  controls  of  T  cells  from  BTX-primed  mice  of:  (a)  Balb/c;  (b)  CBA/JCr;  (c)  C3H/He; 
(d)  C57BL/6;  (e)  SJL.  The  challenge  antigens  were:  (1)  BTX;  (2)  LI;  (3)  Ll/N-tail;  (4)  L2;  (5)  L2  (G);  (6)  L3; 
(7)  L3/Ext;  (8)  L4/Ctail;  (9)  C-tail;  (10)  bovine  serum  albumin;  (11)  myoglobin:  (12)  nonsense  peptide.  (Figure 
is  from  Atassi  et  al,  ref.  39.) 
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Reponses  to  Immunization  with  BTX  Peptides 

The  peptides  were  used  as  immunogens  in  their  free  form  (i.e.  without  coupling  to 
any  carrier)  in  two  of  the  mouse  strains,  Balb/c  and  SJL  (39).  In  both  mouse  strains,  the 
peptides  gave  strong  antibody  responses  (Table  3).  Antibodies  against  peptide  L2  showed 
the  highest  binding  to  intact  BTX.  Antibodies  against  the  other  peptides  exhibited  lower 
binding  activity  to  the  intact  toxin  and  this  activity  was  dependent  on  the  peptide  and  the 
mouse  strain  (39).  The  response  of  peptide-primed  T-cells  to  a  given  immunizing  peptide 
was  not  related  to  whether  this  region  was  immunodominant  with  BTX-primed  T-cells.  The 
ability  of  peptide-primed  T-cells  to  recognize  the  intact  toxin  varied  with  the  peptide  and 
was  dependent  on  the  host  strain  (Figure  8).  These  results  indicate  that  anti-peptide  antibody 
and  T-cell  responses  are  also  under  genetic  control  and  that  their  ability  to  cross-react  with 
the  parent  toxin  is  not  only  dependent  on  the  conformational  exposure  of  the  correlate  region 
in  intact  BTX. 

Protective  Immune  Responses  by  Immunization  with  the  Synthetic  BTX 
Peptides 

Since  the  differences  in  the  abilities  of  the  antibodies  against  the  various  peptides  to 
bind  intact  BgTX  were  not  very  significant  and  since  these  activities  did  not  necessarily 
correlate  with  the  ability  of  anti-peptide  T-cells  to  recognize  BTX,  it  was  decided  to  test  each 
of  the  peptides  for  its  capacity  to  generate  protective  immune  responses  (40). 

In  both  Balb/c  and  SJL,  peptides  LI ,  L2  and  C-tail  were  most  protective  against  BTX 
poisoning  (Table  4)  (protection  index,  PI,  defined  as  LD50  of  immunized  mice/LDjo  for 
unimmunized  controls:  Balb/c,  3.2;  SJL,  2.5-2. 7).  Protective  immunity  exhibited  by  the  other 
peptides  was  also  quite  substantial  (PI:  Balb/c,  2. 5-2. 6;  SJL,  2. 2-2.4).  It  is  noteworthy  that 
the  three  most  protective  peptides  (LI,  L2,  and  C-tail)  were  also  immunodominant  in  terms 
of  binding  of  anti-toxin  antibodies,  suggesting  perhaps  that,  for  identification  of  the  most 
protective  regions,  it  would  have  been  sufficient  to  map  the  immunodominant  regions 
towards  anti-BTX  antibodies  (40). 

Since  each  of  the  peptides  LI,  L2,  and  C-tail  was  quite  protective  (increasing  the 
LD50  of  BTX  about  3  fold  relative  to  control  mice),  it  was  important  to  determine  whether 


Table  3.  Binding  of  anti-peptide  antibodies  to  the  immunizing  peptide  and  to  BTX 


Antibodies  bound  (net  cpm) 

Balb/c  antibodies 

SJL  antibodies 

No.  of 

Binding 

Binding 

No.  of 

Binding 

Binding 

Antigen 

mice 

to  peptide 

to  BTX 

mice 

to  peptide 

to  BTX 

LI 

10 

58,942 

16,432 

8 

77,639 

2,128 

Ll/N-tail 

8 

95,793 

8,460 

9 

62,183 

1,964 

L2 

9 

88,619 

12,831 

9 

65,676 

10,735 

L3 

9 

49,601 

6,200 

10 

38,654 

1,074 

L3/Ext 

10 

35,217 

1,447 

8 

43,704 

3,361 

L4/C-tail 

8 

63,909 

4,170 

10 

73,795 

3,802 

C-tail 

8 

43,306 

5,149 

9 

44,393 

6,062 

Antisera  were  raised  against  each  of  the  peptides  in  Balb/c  and  SJL  mice  and  represent  pools 
of  the  87 -day  bleeds  from  the  number  of  mice  shown.  For  RIA,  the  antisera  were  pre-diluted 
1:500  (v/v)  with  0.15  M  NaCl  in  0.01  M  sodium  phosphate  buffer,  pH  7.2,  containing  0.1% 
bovine  serum  albumin  (details  given  in  ref.  39).  Values  have  been  corrected  for  non-specific 
binding  of  each  antiserum  to  negative  controls,  the  levels  of  which  were  similar  to  those  shown 
in  Table  1.  (Table  is  from  ref  39.) 
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higher  protection  will  be  achieved  by  immunizing  mice  with  all  three  peptides  simultane¬ 
ously.  These  studies  (which  were  done  only  in  Balb/c)  clearly  showed  that  this  was  indeed 
the  case  (40).  Immunization  with  an  equimolar  mixture  of  the  peptides  allowed  the  mice  to 
survive  BTX  challenge  doses  which  were  4.6  fold  higher  than  control  mice  (Table  4).  In 
other  words,  immunization  with  an  equimolar  mixture  of  peptides  LI,  L2  and  C-tail  was 
42%  more  protective,  in  terms  of  survivable  BTX  challenge  dose,  than  any  of  the  three 
peptides  by  itself.  Clearly,  antibodies  against  all  three  regions  are  more  efficient  at  neutral¬ 
izing  toxin  poisoning  than  antibodies  against  any  single  region.  The  protective  capacity  of 
the  peptide  mixture  was  somewhat  related  to  titer  of  the  fraction,  in  anti-peptide  antibodies, 
that  binds  to  BTX.  But  the  titers  of  these  antibodies  were  moderate  and  did  not  increase 
substantially  over  an  extended  period  of  immunization.  It  was  therefore  decided  to  determine 
the  protective  ability  of  a  peptide-carrier  conjugate. 

The  three  peptides  LI,  L2  and  C-tail  were  conjugated  to  a  single  carrier,  ovalbumin 
(40).  Analysis  of  the  conjugate  showed  that  coupling  levels  of  the  peptides  differed.  This  is 
to  be  expected  because  each  peptide  has  different  reactivity  of  side  chains  and  accessibility 
requirements  on  the  surface  of  the  ovalbumin  carrier.  It  was  important  to  find  that  the 
conjugate  generated  high  titer  antibodies  that  bound  to  intact  BTX.  This  immunogen  (i.e. 
the  conjugate)  afforded  excellent  protection  against  BTX  challenge  (PI>18.1)  (Table  4).  In 
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Figure  8.  A  bar  diagram  showing  the  responses  of  peptide-primed  T  cells  to  the  optimum  challenge  dose  of 
the  respective  priming  peptide,  BTX  or  random  peptides.  Peptide-primed  T-cells  were  from:  (a)  Balb/c  mice; 
(b)  SJL  mice.  (Figure  is  from  Atassi  et  al,  ref.  39.) 
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Table  4.  Protection  of  mice  against  BTX  by  immunization  with  BTX  or  with  synthetic 

BTX  peptides 


Protection  parameters  for  Balb/c  and  SJL  mice 

Balb/c 

SJL 

Immunizing 

antigen 

LB50 

(pg  BTX/mouse) 

Protection 

index 

LD50 

(pg  BTX/mouse) 

Protection 

index 

None  or  random  peptides'^ 

3.20 

1.00 

3.60 

1.00 

LI 

10.27 

3.21 

8.86 

2.46 

Ll/N-tail 

8.36 

2.61 

7.86 

2.18 

L2 

10.27 

3.21 

9.76 

2.71 

L3 

7.86 

2.46 

7.94 

2.21 

L3/Ext 

8.36 

2.61 

8.57 

2.38 

L4/C-tail 

8.36 

2.61 

8.64 

2.40 

C-tail 

10.27 

3.21 

8.86 

2.46 

BTX 

31.0 

9.69 

26.50 

7.36 

Mixture  of  LI,  L2,  C-tail 

14.63 

4.57 

nd 

nd 

Multi-peptide  conjugate 
ofLl,L2,  C-tail 

>57.8 

>18.1 

nd 

nd 

■^This  group  includes  45  unimmunized  mice  and  mice  that  were  immunized  with  randomized 
sequence  peptides  R.Ll/N-tail  (45  mice),  R.L2  (45  mice),  and  R.L3/Ext  (45  mice). 


fact,  the  multi-peptide  conjugate  was  almost  twice  as  protective  as  whole  toxin  immunization 
(PI  =  9.7).  In  addition,  unlike  BTX,  the  multi-peptide  conjugate  is  not  toxic  and,  therefore, 
there  is  no  risk  of  poisoning  the  recipient  by  the  immunogen  in  the  process  of  vaccination. 
Clearly,  the  multi -peptide  conjugate  will  constitute  an  excellent  vaccine  against  toxin 
poisoning. 


CONCLUSIONS 

Using  a  comprehensive  synthetic  peptide  strategy  which  originated  in  this  labo¬ 
ratory  (2 1 ,22),  we  have  mapped  the  extracellular  domain  of  AChR  a-chain  for  accessibility 
to  anti-peptide  antibodies  in  isolated  membrane  fractions  and  in  live  muscle  cells  in 
culture.  We  also  mapped  on  this  domain  in  ?AChR  and  in  /?AChR,  the  regions  that  are 
recognized  by  antibodies  and  by  T-lymphocytes  against  the  respective  AChR  and  by  long 
and  short  a-neurotoxins.  Conversely,  we  mapped,  by  synthesis,  the  regions  on  BTX  that 
bind  the  receptor  as  well  as  the  antigenic  regions  of  BTX  that  are  recognized  by  rabbit 
and  mouse  anti-BTX  antibodies.  Three  regions  residing  within  peptides  LI,  L2  and  C-tail 
were  immunodominant.  The  regions  recognized  by  BTX  primed  T-cell  were  also  mapped 
in  five  mouse  strains.  Immunization  of  Balb/c  and  SJL  mice  with  each  of  the  synthetic 
peptides  in  its  free  form  afforded  considerable  protection  against  BTX  poisoning.  Peptides 
LI,  L2  and  C-tail  were  most  protective  and  mice  immunized  with  these  peptides  survived 
LD50  values  that  were  three  times  higher  than  non-immune  control  mice.  Immunization 
with  an  equimolar  mixture  of  the  three  peptides  was  even  more  protective  and  these  mice 
survived  even  higher  challenge  doses  of  BgTX  (4.6  fold  higher  than  LD50  of  controls). 
An  ovalbumin  conjugate  carrying  all  three  peptides,  when  used  as  an  immunogen, 
displayed  a  high  protection  capability  which  was  almost  double  that  obtained  by  BTX 
immunization.  The  conjugate  of  the  three  peptides  should  serve  as  an  effective  vaccine 
against  BTX  poisoning. 


Regions  of  Interaction  between  Nicotinic  AChR  and  a-Neurotoxins 


325 


ACKNOWLEDGMENTS 

The  work  described  in  this  article  was  supported  by  a  grant  (NS- 2 62 80)  from  the 
National  Institute  of  Health  and  by  a  contract  (DAMD  17-89-C-9061)  from  the  Department 
of  Defense,  U.S.  Army  Medical  Research  and  Development  Command. 


REFERENCES 

1 .  Karlin,  A.  (1980)  Molecular  properties  of  nicotinic  acetylcholine  receptors.  In  Cell  Surface  and  Neuronal 
Function  (Edited  by  Colman,  C.W.,  Poste,  G.  andNicolson,  G.L.),  pp.  191-260.  Elsevier/North-Holland 
Biomedical  Press,  New  York. 

2.  Changeux,  J.P.,  Devillers-Thiery  A.  and  Chemouilli,  P.  (1984)  Acetylcholine  receptor;  an  allosteric 
protein.  Science  225,  1335-1345. 

3.  Sobel,  A.,  Weber,  M.  and  Changeux,  J.P.  (1977)  Large-scale  purification  of  the  acetylcholine-receptor 
protein  in  its  membrane-bound  and  detergent-extracted  forms  from  Torpedo  marmorata  electric  organ. 
Eur.  J.  Biochem.  80,  215-224. 

4.  Tzartos,  S.J.  and  Changeux,  J.P.  (1983)  High  affinity  binding  of  a-bungarotoxin  to  the  purified  a-subunit 
and  its  27,00-dalton  proteolytic  peptide  from  Torpedo  marmorata  acetylcholine  receptor.  Requirements 
for  sodium  dodecil  sulfate.  EMBOJ.  2,  381-387. 

5.  McCormick,  D.J.  and  Atassi,  M.Z.  (1984)  Localization  and  synthesis  of  the  acetylcholine-binding  site  in 
the  a-chain  of  the  Torpedo  californica  acetylcholine  receptor.  Biochem.  J.  224,  995-1000. 

6.  Lee,  C.Y.  (1979)  Recent  advances  and  pharmacology  of  snake  toxins.  Adv.  Cytopharmacol.  3,  1-16. 

7.  Maelicke,  A.,  Fulpius,  B.W.,  Klett,  R.P.  and  Reich,  E.  (1977)  Acetylcholine  receptor.  Responses  to  drug 
binding.  J.  Biol  Chem.  252,  4811-4830. 

8.  Haggerty,  J.G.  and  Froehner,  S.C.  (1981)  Restoration  of  '^sj-a-bungarotoxin  binding  activity  to  the 
a-subunit  of  Torpedo  acetylcholine  receptor  isolated  by  gel  electrophoresis  in  sodium  dodecyl  sulfate.  J. 
Bio  Chem.  256,  8294-8297. 

9.  Noda,  M.,  Takahashi,  H.,  Tanabe,  T.,  Toyosato,  M.,  Furutani,  Y,  Hirose,  T.,  Asai,  M.,  Inayama,  S.,  Miyata, 
T.  and  Numa,  S.  (1982)  Primary  structure  of  a-subunit  precursor  of  Torpedo  californica  acetylcholine 
receptor  deduced  from  cDNA  sequence.  Nature  (London)  299,  793-797. 

10.  Noda,  M.,  Furutani,  Y,  Takahashi,  H.,  Toyosato,  M.,  Tanabe,  T,  Shimizu,  S.,  Kikyotani,  S.,  Kayano,  T, 
Hirose,  T,  Inoyama,  S.  and  Numa,  S.  (1983)  Cloning  and  sequence  analysis  of  calf  cDNA  and  human 
genomic  DNA  encoding  alpha-subunit  precursor  of  muscle  acetylcholine  receptor.  Nature  305,  818-823. 

1 1 .  Noda,  M.,  Takahashi,  H.,  Tanabe,  T.,  Toyosato,  M.,  Kikyotani,  Miyata,  T.  and  Numa,  S.  (1983)  Structural 
homology  of  Torpedo  californica  acetylcholine  receptor  subunits.  Nature  302,  528-532. 

12.  Claudio,  T,  Ballivet,  M.,  Patrick,  J.  and  Heinemann  S.  (1983)  Nucleotide  and  deduced  amino  acid 
sequences  of  Torpedo  californica  acetylcholine  receptor  subunit.  Proc.  Nat.  Acad.  Sci.  USA  80,  1111- 
1115. 

13.  Isenberg,  K.E.,  Mudd,  J.,  Shah,  V.  and  Merlie,  J.P.  (1986)  Nucleotide  sequence  of  the  mouse  muscle 
nicotinic  acetylcholine  receptor  alpha  subunit.  Nucleic  Acids  Res.  14,  5111-5111;  Boulter,  J.,  Evans,  K, 
Goldman,  D.,  Martin,  G.,  Treco,  D.,  Heinemarm,  D.  and  Patrick  J.  (1986)  Isolation  of  a  cDNA  clone 
coding  for  a  possible  neural  nicotinic  acetylcholine  receptor  a-subunit.  Nature  (London)  319,  368-374. 

14.  Buonanno,  A.,  Mudd,  J.,  Shah,  V.  and  Merlie,  J.P.  (1986)  A  universal  oligonucleotide  probe  for  acetyl¬ 
choline  receptor  genes;  Selection  and  sequencing  of  cDNA  clones  of  the  mouse  muscle  beta  subunit.  J. 
Biol.  Chem.  261,  16451-16458. 

15.  Yu.,  L,  LaPolla,  J.  and  Davidson,  N.  (1986)  Mouse  nicotinic  acetylcholine  receptor  gamma  subunit;  cDNA 
sequence  and  gene  expression.  Nucleic  Acids  Res.  14,  3539-3555. 

16.  LaPolla,  R.J.,  Mayne,  K.M.  and  Davidson,  N.  (1984)  Isolation  and  characterization  of  a  cDNA  clone  for 
the  complete  protein  coding  region  of  the  delta  subunit  of  the  mouse  acetylcholine  receptor.  Proc.  Natl. 
Acad.  Sci.  USA  81,  7970-7974. 

17.  Noda,  M.,  Takahashi,  H.,  Tanabe,  T.,  Toyosato,  M.,  Kikyotani,  S.,  Hirose,  T.,  Asai,  M.,  Takashima,  H., 
Inayama,  S.,  Miyata,  T.,  Numa,  S.  (1983)  Primary  structures  of  P-  and  5-subunit  precursors  of  Torpedo 
californica  acetylcholine  receptor  deduced  from  cDNA  sequences.  Nature  (London)  301,  251-255. 

18.  Guy,  H.R.  (1983)  A  structural  model  of  the  acetylcholine  receptor  channel  based  on  partition  energy  and 
helix  packing  calculations.  Biophys.  J.  45,  249-261. 


326 


M.  Z.  Atassi  and  B.  Z.  Dolimbek 


19.  Finer-Moore,  J.  and  Stroud,  R.M.  (1984)  Amphipathic  analysis  and  possible  formation  of  the  ion  channel 
in  an  acetylcholine  receptor.  Pwc  Natl.  Acad.  Sci.  81,  155-159. 

20.  Atassi,  M.Z.,  Manshouri,  T.  and  Yokoi,  T.  (1988)  Recognition  of  inter-transmembrane  regions  of 
acetylcholine  receptor  a  subunit  by  antibodies,  T  cells  and  neurotoxins.  Implications  for  membrane 
subunit  organization.  FEES  Lett.  228,  295-300. 

21.  Kazim,  A.L.  and  Atassi  M.Z.  (1980)  A  novel  and  comprehensive  synthetic  approach  for  the  elucidation 
of  protein  antigenic  structures.  Determination  of  the  full  antigenic  profile  of  the  a-chain  of  human 
haemoglobin.  Biochem.  J.  191,  261-264. 

22.  Kazim,  A.L.  and  Atassi,  M.Z.  (1982).  Structurally  inherent  antigenic  sites.  Localization  of  the  antigenic 
sites.  Localization  of  the  antigenic  sites  of  the  a-chain  of  human  haemoglobin  in  three  host  species  by  a 
comprehensive  synthetic  approach.  Biochem.  J.  203:201-208. 

23.  Atassi,  M.Z.  and  Mulac-Jericevic,  B.  (1994).  Mapping  of  the  extracellular  topography  of  the  a-chain  in 
free  and  in  membrane-bound  acetylcholine  receptor  by  antibodies  against  overlapping  peptides  spanning 
the  entire  extracellular  parts  of  the  chain.  J.  Prot.  Chem.  13,  37-47. 

24.  Jinnai,  K.,  Ashizawa,  T.  and  Atassi,  M.Z.  (1994)  Analysis  of  exposed  regions  on  the  main  extracellular 
domain  of  mouse  acetylcholine  a-subunit  in  live  muscle  cells  by  binding  profiles  of  anti  peptide 
antibodies:  J.  Prot.  Chem.  13,  715-722 

25.  Mulac-Jericevic,  B.  and  Atassi,  M.Z.  (1986)  Segment  al82-198  of  Torpedo  californica  acetylcholine 
receptor  contains  a  second  toxin-binding  region  and  binds  anti-receptor  antibodies.  FEES  Lett.  199,  68-74. 

26.  Mulac-Jericevic,  B.  and  Atassi,  M.Z.  (1987)  a-neurotoxin  binding  to  acetylcholine  receptor:  localization 
of  the  full  profile  of  the  cobratoxin-binding  regions  in  the  a-chain  of  Torpedo  californica  acetylcholine 
receptor  by  a  comprehensive  synthetic  strategy.  J.  Prot.  Chem.  6,  365-373. 

27.  Mulac-Jericevic,  B.  and  Atassi,  M.Z.  (1987)  Profile  of  the  a-bungarotoxin  binding  regions  on  the 
extracellular  part  of  the  a-chain  of  Torpedo  californica  acetylcholine  receptor.  Biochem.  J.  248,  847-852. 

28.  Mulac-Jericevic,  B.,  Manshouri,  T.,  Yokoi,  T.  and  Atassi,  M.Z.  (1988)  The  regions  of  a-neurotoxin 
binding  on  the  extracellular  part  of  the  a-subunit  of  human  acetylcholine  receptor.  J.  Prot.  Chem.  7, 
173-177. 

29.  Ruan,  K.-H.,  Stiles,  B.G.  and  Atassi,  M.Z.  (1991)  The  short-neurotoxin  binding  regions  on  the  a-chain 
of  human  and  Torpedo  californica  acetylcholine  receptors.  Biochem.  J.  274,  849-854. 

30.  McDaniel,  C.S.,  Manshouri,  T.  and  Atassi,  M.Z.  (1987)  A  novel  peptide  mimicking  the  interaction  of 
a-neurotoxins  with  acetylcholine  receptor.  J.  Prot.  Chem.  6,  455-461. 

31.  Atassi,  M.Z.,  McDaniel,  C.S.  and  Manshouri,  T.  (1988)  Mapping  by  synthetic  peptides  of  the  binding 
sites  for  acetylcholine  receptor  on  a-bungarotoxin.  J.  Prot.  Chem.  1,  655-666. 

32.  Devillers-Thiery,  J.A.,  Giraudat,  J.,  Bentaboulet,  M.  and  Changeux,  J.P.  (1983).  Complete  mRNA  coding 
sequence  of  the  acetylcholine  binding  a-subunit  of  Torpedo  marmorata  acetylcholine  receptor:  A  model 
for  the  transmembrane  organization  of  the  polypeptide  chain.  Proc.  Natl.  Acad.  Sci.  USA,  80:2067-2071. 

33.  Lennon,  V.A.,  McCormick,  D.J.,  Lambert,  E.H.,  Griesmann,  G.E.  and  Atassi,  M.Z.  (1985)  Region  of 
peptide  125-147  of  acetylcholine  receptor  a-subunit  is  exposed  at  neuromuscular  junction  and  induces 
experimental  autoimmune  myasthenia  gravis,  T-cell  immunity  and  modulating  autoantibodies.  Proc.  Natl. 
Acad.  Sci.  USA,  82,  8805-8809. 

34.  Atassi,  M.Z.,  Ruan,  K.H.,  Jinnai,  K.,  Oshima,  M.  and  Ashizawa,  T.  (1992)  Epitope-specific  suppression 
of  antibody  response  in  experimental  autoimmune  myasthenia  gravis  by  an  mPEG  conjugate  of  a 
myasthenogenic  synthetic  peptide.  Proc.  Natl.  Sci.  USA,  89,  5852-5856. 

35.  Ruan,  K.-H.,  Spurlino,  J.,  Quiocho,  F.A.  and  Atassi,  M.Z.  (1990)  Acetylcholine  receptor  a-bungarotoxin 
interactions:  determination  of  the  region-to-region  contacts  by  peptide-peptide  interactions  and  molecular 
modeling  of  the  receptor  cavity.  Proc.  Natl.  Acad.  Sci.  USA,  87,  6156-6160. 

36.  Endo,  T.  and  Tamiya,  N.  (1987).  Current  view  on  the  structure  function  relationship  of  post-synaptic 
neurotoxins  from  snake  venom.  Pharmacol.  Then  34,  403-451. 

37.  Atassi,  M.Z.  (1991).  Postsynaptic-neurotoxin-acetylcholine  receptor  interactions  and  the  binding  sites  on 
the  two  molecules.  In:  Handbook  of  Natural  Toxins,  ed.  A.  Tu,  pp.  53-83,  Marcel  Dekker,  New  York. 

38.  Love,  R.A.  and  Stroud,  R.M.  (1986)  The  crystal  structure  of  a-bungartoxin  2.5.  A  resolution  related  to 
solution  structures  and  binding  to  acetylcholine  receptor.  Protein  Eng.  1  37-46. 

39.  Atassi,  M.Z.,  Dolimbek,  B.Z.  and  Manshouri,  T.  (1995)  Antibody  and  T-cell  recongition  of  a-bungaro¬ 
toxin  and  its  synthetic  loop  peptides.  Mol.  Immunol,  in  press 

40.  Dolimbek,  B.Z.  and  Atassi,  M.Z.  (1994)  a-Bugarotoxin  peptides  afford  a  synthetic  vaccine  against  toxin 
poisoning.  J.  Prot.  Chem.  13,  490-493. 

41.  Bixler,  G.S.  and  Atassi,  M.Z.  (1983)  Molecular  localization  of  the  full  profile  of  the  continuous  regions 
recognized  by  myoglobin-primed  T-cells  using  synthetic  overlapping  peptides  encompassing  the  entire 
molecule. /mmwwo/.  Commun.  12,  593-603. 


IMMUNOLOGICAL  APPROACH  TO  STUDY 
THE  STRUCTURE  OF  OXIDIZED  LOW 
DENSITY  LIPOPROTEINS 


28 


Chao-Yuh  Yang,  Natalia  V.  Valentinova,  Manlan  Yang,  Zi-Wei  Gu,  John 
R.  Guyton,  and  Antonio  M.  Gotto,  Jr. 

Department  of  Medicine 
Baylor  College  of  Medicine  and 
The  Methodist  Hospital 

6565  Fannin  Street,  MS/A601,  Houston,  Texas  77030 


INTRODUCTION 

Human  low  density  lipoproteins  (LDL),  the  major  carriers  of  cholesterol  in  the 
bloodstream,  plays  the  major  role  in  supplying  cells  of  tissues  and  organs  with  cholesterol. 
It  is  derived  from  the  metabolism  of  the  triglyceride-rich  very  low  density  lipoproteins 
(VLDL).  Pathologic  and  epidemiologic  studies  have  implicated  that  higher  concentration  of 
LDL  in  circulation  is  correlated  with  the  development  of  atherosclerosis.  Apolipoprotein 
(apo)  B-lOO  serves  as  the  ligand  for  the  LDL  receptor  on  cell  surfaces.  Thus,  apoB-100 
occupies  a  crucial  position  in  the  metabolic  pathway  of  cholesterol  and  LDL.  The  complete 
primary  structure  of  apoB-100  has  been  determined  from  its  cDNA  sequence  (Chen  et  al., 
1986;  Knott, et  al,  1986)  and  from  its  proteolytic  peptide  sequence  information  (Yang,  et.  al, 
1986).  ApoB-100  consists  of  4536  amino  acid  residues  with  a  calculated  molecular  mass  of 
513  kDa.  Based  on  the  differential  trypsin  releasibility  of  apoB-100  in  LDL,  apoB  can  be 
divided  into  5  domains.  Domain  1  contains  14  of  the  25  cysteine  (Cys)  residues  in  apoB. 
Sixteen  of  the  25  Cys  residues  (which  are  numbered  from  1  to  25  from  the  amino  end  to  the 
carboxy  end  in  apoB- 100)  exist  in  disulfide  form.  All  14  Cys  residues  in  domain  1  are  linked 
in  disulfide  form,  and  all  except  Cysl-Cys3  and  Cys2-Cys4  are  linked  to  neighboring  Cys. 
Domain  4  contains  7  of  the  16  N-glycosylated  carbohydrates  (Yang  et  al.,  1989).  Based  on 
the  published  structural  information  (Yang  et  al.,  1990),  we  proposed  that  the  structure  of 
apoB- 100  in  LDL  is  likely  to  be  an  elongated  form  that  wraps  around  the  LDL  molecule  as 
shown  in  Fig.  1  (Yang  et  al.,  1992).  The  process  of  atherogenesis  is  believed  to  involve 
transformation  of  macrophages  to  lipid-laden  foam  cells.  Degradation  of  native  LDL  by 
macrophages  occurs  at  relatively  low  rates  and  doesn’t  cause  any  apparent  accumulation  of 
lipids  in  these  cells  (Goldstein  et  al.,1979).  LDL  modified  in  vitro  by  endothelial  cells 
(Henriksen  et  al.,  1983)  or  chemically  (Goldstein  et  al.,1979)  has  been  shown  to  enhance 
LDL  interaction  with  macrophages  and  cause  their  in  vitro  transformation  to  foam  cells. 
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Figure  1.  Structure  of  apoB-100  in 
LDL  (From  Yang,  C.Y.  and  Pownall, 
H.J.,  In  Structure  and  Function  of 
Plasma  Apo lipoproteins,  M.  Ros- 
seneu,  Ed.,  CRC  Press,  Inc.  1992. 
With  Permission). 


Modification  of  LDL  that  occurs  upon  incubation  with  endothelial  cells  was  demonstrated 
to  be  similar  to  LDL  oxidation  in  the  presence  of  oxygen  and  transition  metal  ions,  causing 
similar  changes  in  LDL  structure  that  could  be  prevented  by  addition  of  antioxidants  such 
as  vitamin  E  and  butylated  hydroxytoluene  (BHT)  or  chelators  such  as  EDTA  (Steinbrecher 
et  al.,  1984). 

The  changes  of  immunoreactivity  of  apoB  upon  LDL  oxidation  in  the  presence  of 
Cu"^"^  had  been  reported  by  a  number  of  laboratories.  They  found  that  immunoreactivity  of 
ox-LDL  with  different  antibodies  showed  no  changes  (Gandjini  et  al  1991),  reduced 
reactivity  (Young  et  al.  1986;  Zawadski  et  al.  1 989)  or  increased  reactivity  and  then  decreased 
reactivity  (Zawadski  et  al.  1989;  Gandjini  et  al.  1991).  We  investigated  12  well  characterized 
MAbs.  Of  these  MAbs,  2  showed  non  or  very  little  change,  6  had  reduced  reactivity,  one 
(B6)  had  increased  reactivity  at  the  first  8  hours  and  then  diminished  upon  prolonged 
oxidation,  and  1  (4C1 1)  had  enhanced  immunoreactivity  as  a  function  of  time  up  to  24  hours 
and  stabilized  the  immunoreactivity  for  another  16  hours  (Valentinova  et  al.,  1994). 


MATERIALS  AND  METHODS 
Antibodies 

MAbs  against  LDL  were  Mb43  (Pease  et  al.,  1990),  Mb47  (Young  et  al.,  1986),  BL3, 
BL7  (Pease  et  al.,  1990;  Salmon  et  al.,  1984),  4C11,  2G8,  8G4,  and  5F8  (Yang  etal.,  1993; 
Yanushevskaya  et  al.,  1993)  -  mouse  antibodies;  Bl,  B3,  B5,  B6  (Fievet  et  al.,  1989;  Pease 
et  al.,  1990)  -  rat  antibodies.  The  production  and  specificity  of  each  have  been  previously 
described  as  indicated.  MAbs  4C11,  2G8,  8G4,  and  5F8  were  raised  in  Cell  Engineering 
group  of  The  Institute  of  Experimental  Cardiology  (Moscow,  Russia),  others  were  the 
generous  gifts  from  Dr.  S.G.Young  (Mb43,  Mb47)  of  the  Gladstone  Foundation,  San 
Francisco;  and  from  Dr.  J.C.Fruchart  (BL3,  BL7,  Bl,  B3,  B5,  and  B6)  of  Institut  Pasteur, 
Lille,  France.  Peroxidase  conjugated  goat  anti-mouse  and  anti-rat  IgG  were  purchased  from 
Jackson  ImmunoRe search  Lab,  Inc. 

Immunoreactivity  of  ox-LDL 

Competitive  ELISA  with  different  MAbs  was  used  for  comparison  of  LDL  im¬ 
munoreactivity  before  and  after  oxidation.  To  prepare  the  required  dilutions  of  antibody  or 
LDL  samples  1 0  mM  phosphate  buffer  saline,  pH  7.4,  containing  0.5%  bovine  serum  albumin 
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(BSA,  Fraction  V,  Boehringer  Mannheim  Corp.)  and  0.1%  Tween  20  (Sigma)  (PBS-BSA- 
Tw)  was  used. 

Micro  titration  plates  (96-well,  Corning)  were  coated  with  LDL  (20  pg/ml  in  PBS, 
100  pi  per  well)  by  incubating  at  4°C  overnight.  The  plates  were  washed  with  PBS  + 
0.5%  BSA,  incubated  with  300  pi  of  this  solution  for  1  hour  at  ambient  temperature  and 
then  washed  once  with  PBS-BSA-Tw.  The  wells’  contents  were  discarded  and  50  pi  of 
serially  diluted  LDL  samples  (ox-LDL)  were  mixed  in  the  wells  with  50  pi  of  MAb. 
Optimal  concentration  of  each  MAb  was  determined  previously  as  a  middle  point  of  MAb 
dose-dependent  binding  curve  with  immobilized  LDL  and  were  500  ng/ml  for  B1  and 
B6,  12pg/ml  for  B3,  600  ng/ml  for  B5,  40  ng/ml  for  8G4,  160  ng/ml  for  4C11  and  2G8, 
15  ng/ml  for  BL3,  20  ng/ml  for  BL7.  Mb43  and  Mb47  were  used  as  ascites  diluted 
1:60000  and  1:20000,  respectively.  Plates  were  allowed  to  stand  at  ambient  temperature 
for  2  hours,  then  washed  and  the  second  antibody  -  HRP-conjugated  goat-anti-mouse  or 
goat-anti-rat  IgG  at  appropriate  dilution  was  added  (100  pi  per  well).  After  incubation 
for  1  hour  at  ambient  temperature  and  thorough  washing  with  PBS  plates  were  assayed 
for  peroxidase  activity.  The  substrate  mixture  (100  pi  per  well)  contained  1  mg  o-phenyle- 
nediamine  in  10  ml  of  20  mM  citrate  buffer,  pH  4.7,  and  15  pi  H2O2;  the  reaction  was 
stopped  with  25  pi  of  5  M  H2SO4. 


RESULTS 

LDL  Oxidation  in  the  Presence  of 

The  physico-chemical  properties  of  LDL  changed  during  incubation  with  Cu"^"^.  As 
expected,  apo  B  fragmentation  was  observed  by  SDS-PAGE  (Figure  2)  and  LDL  mobility 
in  native  agarose  gel  increased  with  increase  of  oxidation  time  (Figure  3),  Time-dependent 
increase  in  TBARS  concentration  was  found  in  Cu'^'^  -treated  LDL,  the  effect  was  abolished 
by  the  concomitant  presence  of  EDTA  or  BHT. 


Figure  2.  The  5%-25%  SDS-PAGE  of 
LDL  after  different  periods  of  oxidation: 
l-lh;2-2h;3-4h;4-8h;5-i2h; 
6  -16  h;  7  -  24  h;  8  -  native  LDL. 
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Figure  3.  Non-denaturing  agarose-gel  electro¬ 
phoresis  of  LDL  after  different  periods  of  oxida¬ 
tion:  1  -  0  h  (native  LDL);  2  and  3  -  1  h;  4  -  2  h; 
5  and  6  -  4  h;  7  and  8  -  8  h;  9  and  10  -  24  h;  11  - 
LDL  incubated  for  4  h  at  37®C  without  Cu"^"^  and 
oxygen  flow;  12  -  24  h  at  37°C  without  Cu"^"^  and 
oxygen  flow. 


Immunoreactivity  of  apoB  upon  -Mediated  Oxidation  of  LDL 

MAbs  against  different  apoB  epitopes  (Figure  4)  were  used  to  test  the  immunoreac¬ 
tivity  of  ox-LDL.  MAb  5F8-HRP  was  used  for  apoB  quantification  in  native  and  oxidatively 
modified  LDL  because  the  expression  of  this  epitope  seemed  to  be  relatively  independent 
of  lipid  environment.  The  behavior  of  different  apoB  epitopes  after  oxidation  varied  widely. 
No  changes  in  interaction  of  ox-LDL  with  MAb  5F8  and  8G4  (residues  1-1297)  were 
observed.  Mb47  (3441-3569),  the  antibody  to  the  apoB- 1 00  region  involved  in  LDL-receptor 
interaction  (Young  et  al.,  1986)  didn’t  display  any  changes  in  binding  to  ox-LDL,  however, 
immunoreactivity  of  acetylated  and  MDA-LDL  was  shown  to  decrease  significantly  4-  and 
20-fold,  respectively.  The  apoB  epitope  interacting  with  MAb  B6  (2239-2331)  showed 
enhanced  immunoreactivity  during  the  first  4  hours  of  oxidation  and  then  exhibited  gradual 
decline  of  its  immunoreactivity  upon  prolonged  incubation  with  Cu'^'^.  MAb  B1  (405-539), 

Figure  4.  Schematic  presentation  of  LDL  particle  and  apoB- 100  epitopes.  Apo  B-lOO  (solid  black  line)  is 
shown  to  be  located  on  the  surface  of  LDL  particle  (faded  dark  circle)  and  partly  buried  in  the  lipid  phase. 

This  LDL  model  is  based  on  that  pro¬ 
posed  by  Yang  et  al.  (1989).  ApoB- 
100  fragments  T2,  T3,  T4  generated 
by  thrombin  cleavage  are  shown.  Ap¬ 
proximate  location  of  apoB- 100  epi¬ 
topes  studied  is  indicated  by  arrow 
with  the  name  of  corresponding 
MAb.  Numbers  in  parenthesis  define 
the  amino  acid  residues  correspond¬ 
ing  to  the  immunoreactive  apoB- 100 
fragment.  (From  Valentinova,  N.V.; 
Gu,  Z.W.;  Yang,  M.;  Yanuchevskaya, 
E.V.;  Antonov,  I.V.;  Guyton,  J.R.; 
Smith,  C.V.;  Gotto,  A.M.,Jr.;  and 
Yang,  C.Y.,  Biol.  Chem.  Hoppe- 
Seyler,  375, Oct.  1994,  With  Permis¬ 
sion.) 
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Figure  5.  A)  Interaction  of  4C1 1  with  ox-LDL  (competitive  ELISA);  time  of  oxidation:  0  h  (O),  1  h  (•),  2  h 
(V),  4  h  (T),  8  h  (□),  16  h  (■),  and  24  h  (A);  concentration  of  4C11  was  160  ng/ml.  B)  Concentration  of 
ox-LDL  apo  B  required  for  50%  displacement  of  maximal  4C1 1  binding  in  competitive  ELISA.  C)  Interaction 
of  4C11  with  modified  LDL.  (□)  -  MDA-LDL,  (■)  -  acetylated  LDL;  (O)  -  native  LDL,  and  (A)  -ox-LDL, 
24  h  of  oxidation.  MDA-LDL  and  acetylated  LDL  were  prepared  as  described  elswhere.  (From  Valentinova, 
N.V.;  Gu,  Z.W.;  Yang,  M.;  Yanuchevskaya,  E.V.;  Antonov,  I.V.;  Guyton,  J.R.;  Smith,  C.V.;  Gotto,  A.M.,Jr.;  and 
Yang,  C.Y,  Biol.  Chem.  Hoppe-  Seyler,  375,Oct.  1994,  With  Permission.) 


in  contrast,  had  slightly  reduced  binding  affinity  in  the  first  8  hours  of  LDL  oxidation,  after 
which  an  apparent  increase  in  affinity  was  observed,  however,  these  changes  were  not 
statistically  significant.  Six  MAbs,  2G8  (3728-4306),  BL3  (4235-4355),  Mb43  (4027-4081), 
BL7  (in  the  vicinity  of  residue  2331),  B3  (2239-2331),  and  B5  (1854-1878),  displayed 
considerably  reduced  binding  affinity  to  ox-LDL.  Consistent  changes  in  ox-LDL  immunore- 
activity  were  observed  with  MAb  4C11  (Figure  5  A).  The  immunoreactivity  of  the  epitope 
recognized  by  4C1 1  increased  as  a  function  of  time.  ApoB  concentrations  required  for  50  % 
displacement  of  4C11  maximal  binding  at  different  times  of  LDL  oxidation  are  shown  in 
Figure  5B.  Progressive  increase  in  ox-LDL  immunoreactivity  was  observed  up  to  16  h  of 
oxidation.  The  immunoreactivity  of  ox-LDL  to  4C11  remained  stable  up  to  40  h  of  LDL 
oxidation.  Enhanced  binding  of  4C11  to  MDA-  and  acetylated  LDL  was  also  observed 
(Figure  5C). 
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DISCUSSION 

Changes  in  immunoreactivity  of  the  epitope,  both  increases  and  decreases,  may 
result  from  direct  modification  of  amino  acid  residues  such  as  lysine,  tyrosine,  arginine 
and  histidine,  and  oxidative  cleavage  of  peptide  bonds.  Also,  conformational  changes  of 
apoB  domains  caused  by  fragmentation  of  polypeptide  chain  or  by  changes  of  lipid 
microenvironment  are  probably  involved.  In  the  present  study,  12  Mab  were  used.  6  of 
them,  B5,  B3,  BL7,  2G8,  Mb43,  and  BL3,  displayed  significant  decrease  in  binding  to 
ox-LDL.  All  these  epitopes  are  located  in  the  middle  part  and  carboxyterminal  region  of 
apo  B-100. 

It  is  interesting  to  note  that  MAbs  B3,  B6  and  BL7  recognize  epitopes  located 
proximately,  but  patterns  of  their  binding  to  ox-LDL  were  different.  Those  were  originally 
mapped  to  the  same  apoB  region,  i.e.  between  residues  2239  and  2331  (Pease  et  al.,  1990), 
but  fine  epitope  specificity  of  these  MAB’s  seems  to  be  different.  Epitopes  BL7  and  B3 
demonstrated  progressive  decreases  in  immunoreactivity  (more  pronounced  for  B3), 
whereas  B6  displayed  a  slight  increase  in  immunoreactivity  during  the  first  4  h  of  LDL 
oxidation,  followed  by  a  gradual  decrease. 

The  epitope  for  Mb47  was  expected  to  display  marked  changes  upon  oxidation 
because  the  affinity  of  the  B, E-receptor  to  ox-LDL  is  much  lower  than  to  native  LDL 
(Steinbrecher  et  al.,  1987).  However,  no  changes  in  its  immunoreactivity  were  observed  in 
our  experiments.  Our  results  are  in  agreement  with  those  reported  by  Negri  et  al.  (1993)  that 
show  the  epitope  Mb47  to  remain  unchanged  in  ox-LDL.  In  contrast,  MDA-LDL  and 
acetylated  LDL  displayed  significant  decrease  in  binding  to  Mb47,  may  be  due  to  higher 
degree  of  amino  groups  modification.  It  is  possible  that  modification  of  amino  acid  residues 
located  in  other  regions  of  apoB  influence  receptor-binding  properties  of  ox-LDL. 

MAb  8G4  revealed  no  change  in  binding  to  ox-LDL  and  only  minor  changes  were 
observed  for  5F8-HRP  and  B1  (405-539).  Our  data  suggest  that  the  N-terminal  region  of 
apoB- 1 00  (thrombin-digest  fragment  T4)  is  less  susceptible  to  Cu'^'^-mediated  LDL  oxidation 
as  compared  to  C -terminal  and  the  middle  region  of  apoB  and  may  largely  retain  its 
secondary  structure.  Results  reported  earlier  (Yang  et  al.,  1989)  showed  that  N-terminal 
domain  of  apoB- 100  (between  residues  1-1297)  contains  15  of  the  25  cysteines,  and  14  of 
them  occur  in  intramolecular  disulfide  bonds  (Yang  et  al.,  1990),  which  apparently  stabilize 
the  tertiary  structure  of  this  region. 

In  the  present  study,  noteworthy  changes  in  the  structure  of  the  epitope  recognized 
by  MAb  4C11  were  observed.  Progressive  increase  in  4C11  binding  affinity  was  dem¬ 
onstrated  to  be  a  function  of  oxidation  time  and  TBARS  concentration  in  ox-LDL  samples. 
The  MAb  4C11  showed  a  sustained  increase  in  its  affinity  even  to  severely  oxidized 
LDL,  up  to  40  h  of  oxidation.  No  intact  apoB  was  detected  in  this  LDL  by  SDS-PAGE. 
Hence,  the  marked  immunoreactivity  seems  to  be  attributable  to  apoB  fragments  still 
associated  with  lipid  matrix  (Zawadzki  et  al.,  1989).  The  observed  increase  in  binding 
of  4C11  to  acetylated  and  MDA-LDL  suggests  that  amino  groups  modification  may  be 
responsible  for  changes  of  4C11  interaction  with  ox-LDL.  Both  type  and  degree  of 
modification  seems  to  influence  4C1 1  binding:  the  highest  affinity  of  4C1 1  was  observed 
for  MDA-LDL  which  has  the  highest  degree  of  modification  (about  87%  of  reactive 
amino  groups),  however,  ox-LDL  had  higher  immunoreactivity  and  lower  degree  of 
modification  (about  43%)  in  comparison  to  acetylated  LDL  (70%).  Our  results  demon¬ 
strated  that  immunological  approach  can  be  used  to  understand  the  structure  of  ox-LDL 
and  the  epitope  for  4C11  has  the  potential  to  become  a  useful  marker  for  monitoring 
LDL  oxidation. 
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INTRODUCTION 

Development  of  new  leads  for  drug  design  and  structure/function  relationship  studies 
were  revolutionized  by  the  introduction  of  combinatorial  or  “library”  techniques  (for  review 
see  e.g,  (Moos  et  ah,  1993;  Gallop  et  ah,  1994;  Gordon  et  al.,  1994)).  These  techniques  allow 
for  the  generation  and  screening  of  millions  of  potentially  active  structures.  Due  to  the  well 
developed  and  finely  tuned  synthetic  methodology,  peptides  were  the  first  group  of  com¬ 
pounds  evaluated  by  this  new  approach.  However,  the  next  logical  challenge  is  to  synthesize 
libraries  of  nonpeptidic  structures.  The  combinatorial  library  approach  applied  at  Selectide 
consists  of  three  basic  steps:  (i)  chemical  synthesis  based  on  the  split  synthesis  method 
yielding  a  library  with  one  test  compound  structure  per  one  bead;  (ii)  screening  of  the  library 
either  using  an  on-bead  binding  assay  or  a  multiple  step  release  assay;  and  (iii)  recovery  of 
positive  beads  and  determination  of  the  structure  of  the  test  compound  (Lam  et  al.,  1991). 


CHEMICAL  LIBRARY  TYPES 

Each  chemically  synthesized  combinatorial  library  represents  a  certain  structural 
diversity  and  multiplicity.  Libraries  containing  sequential  repetition  of  amino  acids  (peptide 
libraries)  are  easy  to  synthesize  and  the  structure  of  compound  of  interest  can  be  easily 
determined  by  sequencing.  However,  such  libraries  do  not  contain  very  high  structural 
diversity,  since  the  only  variable  parameter  is  the  type  of  side-chain  connected  to  the  C-alpha 
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carbon  of  the  peptide  backbone,  and  those  side-chains  occupy  only  limited  conformational 
space.  Combining  natural  L  amino  acids  with  D  amino  acids  brings  more  diversity,  never¬ 
theless,  it  is  still  quite  limited.  Over  the  last  three  years  we  have  synthesized  and  screened 
approximately  400  peptide  libraries.  These  libraries  ranged  from  linear  (with  exposed  N-  or 
C-terminus),  to  cyclic  (homo  or  heterodetic),  to  libraries  with  a  high  probability  of  regular 
structural  features  (alpha  helix,  beta  turn),  covering  most  of  the  conformational  space  which 
can  be  explored  by  a  peptide  structure  with  molecular  weight  below  1000. 

The  advent  of  non-peptide  libraries  increased  the  diversity  of  conformational  space 
filled  by  the  test  compound  subunits,  as  well  as  increased  chemical  diversity  due  to  the  nature 
of  the  subunits  (see  e.g.  Simon  et  ah,  1992;  Cho  et  al.,  1993;  DeWitt  et  ah,  1993;  Nikolaiev 
et  al.,  1993;  Bunin  et  al.,  1994;  Chen  et  al,,  1994;  Gordon  et  al.,  1994;  Lebl  et  al.,  1994; 
Stankova  et  al.,  1994).  Combinatorial  libraries  of  chemically  synthesized  compounds  can  be 
classified  into  several  distinct  groups  in  which  libraries  from  individual  groups  represent 
certain  structural  types:  (i)  Libraries  of  small,  compact,  and  relatively  rigid  structures  (e.g. 
N-acyl-N- alkyl  amino  acids);  (ii)  Libraries  based  on  a  more  or  less  rigid  scaffold  structure 
(usually  multifunctional  cyclic  scaffold,  e.g.  derivatized  cyclopentane  or  cyclohexane  ring, 
functionalized  steroid  skeleton,  trie arboxy benzene,  diaminobenzoic  acid);  (iii)  Libraries 
based  on  a  flexible  scaffold  that  is  built  during  the  synthesis  of  the  library  and  can  be 
randomized  (branched  scaffold  based  on  diamino  acids,  a,  p,  y,  5-library);  (iv)  Libraries  of 
linear,  sequential  compounds  (typical  example  is  peptide  library,  including  also  N-substi- 
tuted  glycines  —  peptoids,  or  a,  p,  and  y  amino  acids  containing  library);  (v)  Libraries  of 
small  organic  molecules  (e.g.  benzodiazepine  type).  Library  types  which  we  have  explored 
are  shown  in  figure  1 . 
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Figure  1.  Structure  of  studied  nonpeptidic  library  types. 
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DETERMINATION  OF  POSITIVELY  REACTING  STRUCTURES 

Once  the  bead  of  interest  is  selected  by  the  screening  protocol,  it  is  neccessary  to  determine 
the  structure  of  the  test  compound  responsible  for  the  observed  effect.  Peptide  structures  can  be 
easily  determined  by  sequencing  using  automatic  microsequencers.  The  structure  determination 
of  hits  from  nonpeptide  libraries  is  complicated  by  the  fact  that  the  amount  of  positively  reacting 
compound  is  limited.  Standard  bead  of  1 00  pm  diameter  carries  approximately  1 00  pmoles  of  the 
functional  group  onto  which  the  library  can  be  built.  This  amount  of  organic  structure  does  not 
allow  application  of  modem  analytical  methods  for  stmcture  elucidation.  The  only  exception  is 
mass  spectroscopy,  which  can  be  applied  in  cases  when  the  library  is  composed  of  a  limited 
number  of  structures  or  in  cases  where  the  fragmentation  patterns  are  known  and  predictable.  An 
example  of  mass  spectroscopical  stmcture  determination  is  shown  in  figure  2.  Beads  expressing 
binding  to  streptavidin  were  selected  from  the  small  library  of  N-acyl-N-alkyl  amino  acids.  The 
compound  was  cleaved  from  the  bead  and  all  components  of  the  generated  mixture  were  analyzed 
by  MS/MS  experiment.  The  deduced  stmctures  were  resynthesized  and  their  mass  spectra 
matched  those  obtained  for  components  cleaved  from  the  beads.  Binding  to  streptavidin  was 
verified  by  solution  assay  (Stankova  et  al.,  1994).  The  stmctures  from  a  library  based  on  the 
attachment  of  carboxylic  acids  to  a  modified  Kemp’s  triacid  scaffold  were  also  analyzed  by  the 
MS/MS  technique  (figure  3). 

In  cases  when  mass  spectroscopy  cannot  be  used,  a  coding  principle  is  applied  (Brenner 
&  Lemer,  1992;  Kerr  et  al.,  1993;  Needels  et  al.,  1993;  Nielsen  et  al.,  1993;  Nikolaiev  et  al, 
1993;  Ohlmeyer  et  al.,  1993).  The  various  formats  of  coded  libraries  are  given  in  figure  4. 
Linear  coding  is  based  on  parallel  synthesis  of  the  screening  and  coding  stmcture.  Fractional 
coding  is  realized  in  two  ways:  (i)  Simultaneous  coupling  of  a  tag  together  with  tagged  building 
block  -  e.g.  coupling  0.05  equivalents  of  norleucine  together  with  a  D-amino  acid  to  identify 
the  configuration  of  the  amino  acid  during  sequencing,  or  (ii)  Capping  part  of  the  growing  chain 
by  the  tag  which  can  be  later  cleaved  and  identified  as  such  (Ohlmeyer  et  al.,  1993),  or  as  a 
tagged  molecule  (Sepetov,  1992;  Youngquist  et  al.,  1 994).  This  last  possibility  is  illustrated  in 
figure  5,  showing  the  tagging  of  a  growing  peptide  chain  by  bromobenzoylation.  After  cleaving 
the  mixture  of  full  length  peptide  and  tmncated  bromobenzoylated  fragments,  mass  spectro¬ 
scopic  evaluation  allows  the  elucidation  of  the  peptide  sequence  (Sepetov,  1992).  Binary 
coding  utilizes  a  mixture  of  several  blocks  instead  of  a  single  coding  block  for  coding  building 
block  of  screening  stmcture.  Using  a  different  set  of  coding  blocks  for  coding  different  positions 
in  the  library  allows  for  the  constmction  of  a  coding  stmcture  in  such  a  way  that  the  coding 
blocks  are  cleaved  and  analyzed  in  a  single  step  (Ohlmeyer  et  al.,  1993). 

Nature  has  coded  proteins  by  nucleic  acids  for  ages.  However,  coding  nonpeptidic 
compounds  by  peptide  stmctures  is  robust  and  reliable  (Kerr  et  al.,  1993;  Nikolaiev  et  al., 
1 993).  Each  chemical  individuum  in  the  synthetic  library  is  independently  coded  by  a  peptide 
whose  composition  can  be  easily  resolved  using  an  established  technique  (Edman  degrada¬ 
tion).  The  synthesis  scheme  of  a  coded  library  is  shown  in  figure  6.  Peptidic  tags  can  be 
constructed  in  such  a  way  that  one  cycle  of  Edman  degradation  will  cleave  all  coding  amino 
acids  and  a  single  HPLC  mn  will  reveal  all  components.  The  stmcture  of  a  coding  molecule 
(more  appropriately  a  mixture  of  coding  molecules)  is  shown  in  figure  7  together  with  the 
HPLC  trace  of  the  product  of  one  Edman  degradation  cycle  of  this  molecule. 

The  coding  principle  bears  one  inherent  complication.  If  the  screening  process  is 
being  performed  on  the  bead,  the  coding  stmcture  can  interact  with  the  target  molecule.  Three 
possibilities  exist  to  prevent  the  interaction  of  the  coding  stmcture  with  the  target:  (i)  The 
coding  stmcture  can  be  present  in  a  very  low  concentration  so  that  the  interaction  with  the 
target  molecule  will  not  be  seen  under  the  conditions  of  the  experiment;  (ii)  The  coding  and 
test  structures  can  be  physically  separated;  (iii)  The  test  stmcture  can  be  coded  by  a 
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Figure  3.  MS/MS  spectrum  of  a  compound  from  library  constructed  on  Kemp’s  triacid.  Spectrum  of  bead 
bound  compound  (upper  trace)  and  of  resynthesized  compound. 
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Figure  4.  Coded  library  formats. 


multiplicity  of  coding  structures.  The  first  possibility  is  not  realistic  in  the  case  of  peptide 
coding  due  to  the  limited  sensitivity  of  peptide  sequencing.  However,  it  can  be  used 
advantageously  in  cases  of  coding  by  nucleic  acids,  where  the  coding  structure  can  be 
conveniently  amplified  (Needels  et  al.,  1 993).  The  second  option  was  explored  by  us  recently 
(Vagner  et  al.,  1994).  Separation  of  the  “surface”  of  the  bead,  which  is  available  for 
interaction  with  the  macromolecular  target,  from  the  “interior”  of  the  bead,  was  achieved  by 
enzymatic  “shaving”.  To  this  target  inaccessible  “interior”  was  coupled  the  coding  structure. 
The  last  possibility  is  based  on  the  idea  of  coding  using  a  different  set  of  structures  rather 
than  one  unique  structure.  This  set  of  structures  must  provide  unambiguous  information 
about  the  chemistry  performed  on  screening  arm. 
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Figure  5.  Coding  by  bromobenzoyl  cap. 
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Figure  6.  Synthesis  scheme  of  a  coded  library. 
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INTRODUCTION 

The  virion  of  the  filamentous  bacteriophage  fd  (fl  and  Ml  3  are  very  similar  strains) 
is  a  flexible  rod  about  1  pm  long  and  6  nm  in  diameter,  comprising  a  tubular  sheath  of  approx. 
2700  copies  of  the  major  coat  protein  subunit  surrounding  a  DNA  core.  The  DNA  is  a 
single-stranded  circular  molecule  of  6408  nucleotides  embodying  1 0  genes;  these  genes  are 
tightly  packed  and,  in  some  instances,  overlapping,  apart  from  a  short  region  (the  intergenic 
space)  which  encodes  no  protein  component  but  which  contains  a  double- stranded  helical 
hairpin  loop  responsible  for  initiating  assembly  of  the  virion.  There  are  a  few  copies  (about 
5)  of  each  of  two  minor  coat  proteins  at  the  two  ends  of  the  virion:  gVIIp  and  glXp  at  the 
end  where  assembly  is  initiated,  and  glllp  and  gVIp  at  the  end  where  the  process  is  terminated 
[for  general  reviews,  see  Model  &  Russel,  1988;  Russel,  1991)]. 

The  major  coat  protein  (gVIIIp)  contains  50  amino  acid  residues  and  is  largely 
a-helical  (Glucksman  et  al.,  1992;  Marvin  et  al.,  1994).  It  has  a  tripartite  structure:  an 
N-terminal  segment  that  is  rich  in  acidic  and  hydrophilic  residues;  a  19-residue  stretch  of 
apolar  and  hydrophobic  amino  acids;  and  a  C-terminal  region  rich  in  basic  residues.  In  the 
virion,  these  protein  subunits  are  in  a  shingled,  helical  array,  with  five  subunits  in  the  16.26 
A  axial  repeat.  The  long  axes  of  the  helices  make  a  small  angle  with  the  axis  of  the  virion 
and  their  N-terminal  segments  occupy  the  outside  of  the  particle.  The  first  four  or  five  amino 
acids  are  conformationally  mobile,  as  judged  by  NMR  spectroscopy  (Colnago  et  al.,  1987), 
in  keeping  with  their  location  at  the  viral  surface.  The  apolar  regions  generate  a  hydrophobic 
girdle  of  protein-protein  interactions,  leaving  the  positively-charged  C-terminal  regions  to 
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line  the  inside  of  the  tubular  capsid  where  they  can  interact  with  the  negatively-charged 
sugar-phosphate  backbone  of  the  DNA,  a  mechanism  of  encapsidation  that  obviates  the  need 
for  DNA  sequence  specificity  (  Hunter  et  al.,  1987;  Glucksman  et  al.,  1992;  Marvin  et  al., 
1994). 

One  of  the  most  exciting  new  technologies  of  recent  years  is  the  display  of  foreign 
peptides  and  proteins  on  the  surface  of  filamentous  bacteriophage  particles.  The  peptide  or 
protein  is  encoded  by  DNA  inserted  at  an  appropriate  site  in  the  structural  gene  of  one  of  the 
virus  coat  proteins;  provided  the  modified  coat  protein  remains  compatible  with  virus 
assembly,  the  foreign  polypeptide  will  appear  on  the  surface  of  the  progeny  virion  where  it 
can  act  as  a  ligand  or  effector  in  a  wide  variety  of  biologically  important  systems  [for  recent 
reviews,  see  Scott  (1992),  Cesareni  (1992)  and  Smith  (1993)].  If  random  DNA  sequences 
are  used  as  inserts,  the  corresponding  amino  acid  sequences  form  vast  libraries  of  displayed 
peptides.  These  can  be  screened  in  various  ways  as  a  powerful  means  of  identifying  novel 
biologically  active  peptides  or  of  testing  large  numbers  of  mutants  for  loss  or  acquisition  of 
biological  activity. 

We  describe  here  a  series  of  experiments  that  has  led  to  a  deeper  understanding  of 
the  unusual  mode  of  DNA-protein  interaction  that  underlies  the  process  of  DNA  encapsida¬ 
tion,  the  parameters  that  govern  peptide  display  on  gVIIIp,  and  the  immunological  properties 
of  peptides  displayed  in  this  way. 


DIRECTED  MUTAGENESIS  OF  THE  MAJOR  COAT  PROTEIN 

The  C-terminal  segment  of  the  major  coat  protein  contains  4  lysine  residues  (posi¬ 
tions  40, 43, 44  and  48).  The  importance  of  their  positively  charged  side-chains  in  interacting 
with  the  negatively  charged  sugar-phosphate  backbone  of  the  DNA  has  been  emphatically 
proved  by  directed  mutagenesis  of  the  phage  gene  VIII  encoding  the  major  coat  protein.  If 
Lys48  is  replaced  by  a  neutral  amino  acid  (Gin,  K48Q;  Thr,  K48T;  or  Ala,  K48A),  viable 
mutant  virions  are  produced  but  are  found  to  be  35%  longer  than  the  wild-type  particles.  On 
the  other  hand,  if  Lys48  is  replaced  with  arginine  (K48R),  which  conserves  the  positive 
charge  on  the  side-chain,  the  length  of  the  virion  is  unchanged  (Hunter  et  al.,  1987).  Further 
experiments  have  shown  that  the  K48E  mutant  protein,  which  would  further  lower  the 
positive  charge  density  inside  the  capsid,  is  unacceptable,  unless  it  is  incorporated  into  hybrid 
virions  that  contain  some  wild-type  or  K48Q  mutant  coat  proteins  to  help  restore  the  positive 
charge,  at  least  in  part  (Rowitch  et  al.,  1988).  These  hybrid  virions  are  all  longer  than  the 
wild-type  but  their  lengths  are  variable,  depending  on  the  ratio  of  K48E  to  wild-type  or  K48Q 
coat  proteins  in  the  capsid  (Rowitch  et  al.,  1988). 

These  experiments  are  all  consistent  with  the  proposal  (Marvin  &  Wachtel,  1976; 
Marvin,  1 978)  that  there  is  direct  but  non-specific  electrostatic  interaction  between  the  DNA 
and  coat  protein  in  filamentous  bacteriophages.  A  reduction  in  the  positive  charge  density 
per  unit  length  inside  the  protein  sheath  of  the  K48Q,  K48T  or  K48A  virions  would  require 
a  matching  fall  in  the  negative  charge  density  per  unit  length  of  the  DNA  core.  This  would 
be  achieved  most  simply  by  an  elongation  of  the  encapsidated  DNA,  leading  to  a  correspond¬ 
ing  increase  in  the  length  of  the  virion  in  which  the  protein-protein  interactions  remain 
essentially  unchanged  (Hunter  et  al,  1987).  Similarly,  the  existence  of  the  hybrid  virions 
with  variable  particle  lengths  can  be  explained  by  postulating  that  the  number  of  nucleotides 
packaged  per  coat  protein  subunit  is  not  restricted  to  any  particular  value  and  that,  depending 
on  the  positive  charge  density  generated  by  the  protein  sheath,  the  DNA  must  adopt  a 
compatible  electrostatic  (and  thus  spatial)  arrangement  during  the  elongation  phase  of  virus 
assembly  (Rowitch  et  al.,  1988;  Greenwood  et  al.,  1991a). 
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FIBRE  DIFFRACTION  ANALYSIS  OF  BACTERIOPHAGE  FD 

We  have  now  obtained  direct  structural  evidence  in  support  of  this  interpretation  by 
means  of  X-ray  fibre  diffraction  analysis  of  bacteriophage  fd  and  the  K48  A  mutant  (Sym- 
mons  et  al,,  1995).  The  overall  distribution  of  intensity  and  the  pattern  of  layer  lines  were 
found  to  be  essentially  unchanged  by  the  K48  A  mutation.  There  is  an  intrinsic  ambiguity  in 
fibre  patterns  that  can  make  it  difficult  to  distinguish  integral  differences  in  helical  symmetry 
(discussed  by  Marvin,  1978;  Nave  et  al.,  1981).  In  the  case  of  bacteriophage  fd,  this  could 
obscure  a  change  in  the  number  of  protein  subunits  per  16.26  A  axial  repeat  from  the 
wild-type  value  of  five,  to  four  or  six  in  the  mutant.  However,  no  such  change  was  detected 
by  comparative  sedimentation  analysis  of  wild-type  fd  and  K48  A  or  K48Q  mutants  (Molina- 
Garcia  et  al.,  1 992).  We  conclude  that  the  structure  of  the  coat  protein  subunit  and  the  helical 
symmetry  of  its  packing  in  the  capsid  are  essentially  unchanged  in  the  K48  A  mutant. 

The  distribution  of  intensity  along  the  equator  of  a  fibre  diffraction  pattern  gives 
direct  information  about  the  radial  electron  density  distribution  in  a  single  virion.  In  hydrated 
gels  of  fd  and  K48A,  the  virions  are  sufficiently  far  apart  for  them  to  diffract  as  individual 
particles  [see  Wachtel  et  al.  (1974)]  and  the  molecular  transform  of  the  virion  can  be 
measured  directly  from  these  patterns.  The  first  maximum  on  the  equator  at  about  7?  =  0.025 
A'^  is  stronger  (relative  to  neighbouring  maxima)  for  the  K48A  mutant  than  for  wild-type 
(Fig.  la  and  lb).  This  kind  of  change  in  intensity  was  also  noted  for  the  equatorial  intensity 
of  the  filamentous  bacteriophage  strain  Pfl  relative  to  strain  Ifl  (Wachtel  et  al.,  1974),  and 
is  best  explained  by  a  reduction  in  the  DNA:protein  ratio.  The  effects  on  the  low-resolution 
electron  density  distribution  attributable  to  these  changes  in  the  diffraction  pattern  are 
illustrated  in  Fig.  Ic.  There  is  a  reduction  in  electron  density  of  the  K48A  mutant  relative  to 
wild  type  at  a  radius  of  about  4  A,  corresponding  to  the  average  radius  of  the  DNA.  The 
reduction  in  electron  density  at  about  12  A  radius  may  correspond  to  the  replacement  of  the 
side-chain  of  Lys48  by  the  methyl  group  of  alanine,  with  a  displacement  of  protein  to  occupy 
space  vacated  by  DNA.  There  may  also  be  parts  of  the  DNA  (phosphate?)  at  12  A.  However, 
it  is  abundantly  clear  that  the  electron  density  in  the  centre  of  the  virions,  attributable  to  the 
DNA  core,  is  lower  in  the  K48A  mutant  than  in  wild-type  fd.  Moreover,  the  difference  in 
electron  density  distribution  near  the  centre  is  consistent  with  a  reduction  in  the  mass  of 
DNA  per  protein  subunit  in  the  K48A  mutant  to  75%  of  its  wild-type  value,  as  expected  for 
the  same  amount  of  DNA  packaged  in  a  virion  that  is  35%  longer. 


THE  MECHANISM  OF  DNA  PACKAGING 

Taken  together  the  results  of  directed  mutagenesis  and  X-ray  fibre  diffraction  show 
that  lowering  the  positive  charge  density  per  unit  length  inside  the  protein  tube  forces  the 
DNA  to  increase  the  length  it  occupies  by  adopting  a  more  elongated  configuration,  thereby 
lowering  its  negative  charge  density  per  unit  length  in  a  matching  process.  A  longer  virion 
is  thus  required  to  package  the  same  amount  of  DNA.  Moreover,  given  the  existence  of 
hybrid  virions  with  varying  lengths,  the  number  of  nucleotides  packaged  per  coat  protein 
subunit  is  not  restricted  to  any  particular  value.  The  capsid  can  thus  be  regarded  as  a  protein 
sheath  lined  with  positive  charges  interacting  electrostatically  and  non-specifically  with  a 
negatively-charged  DNA  core  of  matching  charge  density,  and  with  the  length  of  the  virion 
dictated  by  the  length  that  the  DNA  molecule  is  required  to  adopt. 

It  has  previously  been  shown  that  the  replacement  of  Ser47  in  the  major  coat  protein 
with  lysine,  which  would  raise  the  positive  charge  density  lining  the  capsid,  is  unacceptable 
(Greenwood  et  al.,  1991a).  This  suggests  that  it  is  impossible  to  force  the  DNA  to  adopt  a 
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(a)  (b) 


Figure  1.  X-ray  fibre  diffraction  patterns  of  filamentous  bacteriophage  fd  and  radial  electron  density  distribu¬ 
tions.  a)  Gel  of  wild-type  fd,  prepared  at  pH  6.5,  showing  the  strong  /  =  0  and  /  =  1  layer  lines  for  c  =  32.52 
A;  b)  gel  of  K48A  mutant,  prepared  at  pH  6.5,  layer  lines  as  in  a).  The  fibre  axis  is  vertical  and  the  sharp 
meridional  reflexions  near  the  top  and  bottom  are  at  8. 13  A.  c)  Radial  electron  density  distribution  calculated 
for  wild-type  fd  (continuous  curve)  and  K48A  mutant  (broken  curve).  For  full  details,  see  Symmons  et  al. 
(1995). 


shorter  length  than  it  does  in  the  wild-type  virion,  probably  because  the  central  hole  it 
occupies  is  too  narrow  to  accommodate  the  compression.  We  can  thus  regard  the  virus  as 
having  reached  evolutionary  perfection  in  packaging  its  DNA  in  a  capsid  containing  the 
minimum  number  of  coat  protein  subunits:  one  extra  positive  charge  in  the  C-terminal 
segment  is  impossible,  one  fewer  and  35%  more  coat  protein  subunits  are  required  to  package 
the  same  amount  of  DNA  (Greenwood  et  al,  1991a). 


SURFACE  DISPLAY  OF  FOREIGN  PEPTIDES  ON 
BACTERIOPHAGE  PARTICLES 

Since  the  first  description  of  the  incorporation  of  foreign  peptides  into  a  minor  coat 
protein  of  filamentous  bacteriophage  (Smith,  1985),  numerous  developments  and  applica¬ 
tions  of  the  concept  have  been  described.  Most  of  the  experiments  have  relied  on  expression 
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Figure!.  Schematic  diagram  show-  (*») 
ing  (a)  wild-type  bacteriophage  fd; 

(b)  recombinant  virions;  and  (c)  hy¬ 
brid  virions.  In  the  recombinant 
virions,  all  2700  copies  of  the  major 
coat  protein  are  displaying  a  peptide 
insert,  whereas  in  the  hybrid  virions,  (c) 
mutated  coat  proteins  are  inter¬ 
spersed  with  wild-type  coat  prote¬ 
ins. 


Recombinant 


Hybrid 


systems  that  utilize  modification  of  the  phage  gene  III  to  allow  foreign  peptides  or  proteins 
to  be  inserted  at  or  near  the  N-terminus  of  glllp  (Scott,  1992;  Cesareni,  1992;  Smith,  1993). 
This  limits  the  number  of  peptides  displayed  to  a  few  copies  at  one  end  of  the  virion. 
However,  it  is  also  possible  to  modify  phage  gene  VIII  so  that  the  foreign  peptides  are 
incorporated  near  the  N-terminus  of  gVIIIp,  which  ensures  prominent  exposure  of  multiple 
copies  of  the  peptide  on  the  surface  of  the  virion  (ITichev  et  al.,  1989;  Kang  et  al,  1991; 
Greenwood  et  al.,  1991b;  Felici  et  al.,  1991).  It  turns  out  that  five  (ITichev  et  al.,  1989)  or 
up  to  six  (Greenwood  et  al.,  1991b)  amino  acids  can  readily  be  inserted  in  this  way, 
generating  a  recombinant  virion  in  which  all  2700  copies  of  the  major  coat  protein  are 
displaying  the  peptide  (Fig.  2).  However,  it  proved  difficult  or  impossible  to  accommodate 
larger  peptides  in  recombinant  virions.  To  overcome  this,  hybrid  phage  capsids  can  be 
constructed  (Kang  et  al,  1991;  Greenwood  et  al.,  1991b;  Felici  et  al.,  1991)  in  which  the 
modified  coat  proteins  are  interspersed  with  copies  of  the  wild-type  protein  (Fig.  2). 

Depending  on  the  peptide,  up  to  30-40%  of  the  major  coat  protein  subunits  in  a  hybrid 
capsid  can  carry  an  insert  of  12  or  more  amino  acid  residues  (Greenwood  et  al.,  1991b; 
Veronese  et  al.,  1994).  However,  with  appreciably  larger  peptides  or  intact  proteins,  the 
frequency  of  display  falls  substantially:  only  a  few  copies  of  Fab  antibody  fragments  (Kang 
et  al.,  1991;  Huse  et  al.,  1992)  or  trypsin  (Corey  et  al.,  1993)  can  be  incorporated  into  the 
phage  particle.  In  the  X-ray  model  of  the  virion,  there  is  sufficient  room  on  the  surface  of 
the  virion  for  each  N-terminal  region  of  a  gVIIIp  subunit  to  accommodate  peptides  much 
larger  than  the  six  or  so  residues  currently  found  acceptable  (Makowski,  1993).  Recent 
experiments  in  our  laboratory  indicate  that  the  inability  to  achieve  a  high  frequency  of  display 
of  large  peptides  or  proteins  lies,  at  least  in  part,  in  the  insertion  and  processing  of  the 
enlarged  pro-coat  molecule  in  the  bacterial  cell  membrane  where  assembly  takes  place  (A. 
Langara,  L.  Gowda,  R  Malik  and  R.N.Perham,  unpublished  work).  A  deeper  understanding 
of  this  problem  may  lead  to  ways  to  overcome  it. 


IMMUNOLOGICAL  PROPERTIES  OF  DISPLAYED  PEPTIDES 

As  expected  from  their  surface  location  on  the  virion,  peptides  displayed  close  to  the 
N-terminus  of  gVIIIp  in  a  bacteriophage  capsid  are  highly  immunogenic,  in  the  presence  or 
absence  of  adjuvant  (Greenwood  et  al,  1991b;  Minenkova  et  al.,  1993),  much  more  so  than 
peptides  on  glllp  (de  la  Cruz  et  al.,  1988).  Anti-wild-type  antibodies  can  be  removed  by 
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passage  of  the  antiserum  through  a  column  of  immobilized  wild-type  phage  particles,  leaving 
an  antibody  preparation  directed  solely  against  the  peptide  insert  (Greenwood  et  al.,  1991b). 
Moreover,  the  antibody  specificity  is  very  high  (Willis  et  al.,  1993).  Thus  we  have  a  new 
means  of  preparing  antibodies  against  peptides,  substantially  simpler  and  less  expensive  than 
the  conventional  methods  of  chemical  synthesis  followed  by  chemical  coupling  of  the 
peptide  to  a  suitable  carrier  protein  before  injection. 

As  shown  by  the  poor  antibody  titre  generated  in  nude  BALB/c  mice,  the  immune 
response  against  hybrid  bacteriophage  epitopes  is  T-cell  dependent,  a  conclusion  supported 
by  the  observation  of  class-switching  from  IgM  to  IgG  during  the  maturation  of  the  response 
in  heterozygous  (nu+/-)  but  not  homozygous  (nu/nu)  nude  mice  (Willis  et  al.,  1993).  The 
generation  of  a  strong  immune  response  in  the  absence  of  adjuvants  suggests  that  helper 
T-cell  activity  is  being  stimulated  direct,  adding  to  the  ease  of  the  procedure  as  a  means  of 
raising  anti-peptide  antibody. 


STRUCTURAL  MIMICRY  OF  NATURAL  EPITOPES 

The  ability  of  short  peptides  displayed  on  filamentous  bacteriophage  particles  to 
mimic  natural  protein  epitopes  has  been  tested  using  the  principal  neutralizing  determinant 
of  the  human  immunodeficiency  virus  HIV.  This  is  an  intra-chain  disulphide-bridged  loop, 
designated  V3,  in  the  third  hypervariable  region  of  the  variable  surface  glycoprotein  gpl20. 
A  hybrid  filamentous  bacteriophage  displaying  the  12-residue  sequence  IHIGPGRAFYTT 
derived  from  the  tip  of  the  loop,  turns  out  to  be  a  remarkably  effective  structural  mimic  of 
the  natural  epitope,  capable  of  eliciting  neutralizing  antibodies  in  mice  against  the  parental 
strain  HIV-l^^j  and  also  against  related  but  different  strains  of  HIV  such  as  IIIB  and  Rutz 
(Veronese  et  al.,  1994). 

The  high  level  of  structural  mimicry  achieved  is  emphasized  by  the  fact  that  in  ELISA 
assays  employing  natural  anti-HIV  antisera  from  HIV-infected  patients,  the  12-residue 
peptide  displayed  on  a  phage  particle  is  at  least  one  and  perhaps  two  orders  of  magnitude 
more  sensitive  as  a  substrate  than  the  same  peptide  covalently  coupled  to  the  microtitre  plate 
by  conventional  chemical  means  (Fig.  3).  This  suggests  that  theN-terminal  region  of  gVIIIp 


5 


Figure  3.  ELISA  assays  of  sera  from  two  different  human 
HIV- 1 -infected  individuals  (ps#l,  ps#2).  The  ELISA  as¬ 
says  were  conducted  with  peptides  (2iug/well)  or  hybrid 
MN  phage  and  wild- type  phage  fd  (20p.g/weU).  The  hybrid 
MN  phage  is  equivalent  to  Ipg  peptide/well.  The  MN 
phage  is  displaying  the  peptide  IHIGPGRAFYTT  inserted 
between  residues  3  and  4  of  the  major  coat  protein.  The 
peptides  were:  12-mer,  IHIGPGRAFYTT;  23-mer, 
YNKRKRIHIGPGRAFYTTKNIIG.  For  full  details,  see 
Veronese  et  al.  (1994). 
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on  the  bacteriophage  surface  offers  a  sympathetic  folding  milieu  in  which  an  inserted  peptide 
can  adopt  a  conformation  close  to  its  state  in  the  V3  loop  in  the  HIV  virion. 

It  has  similarly  been  reported  that  a  corrupt  5-residue  peptide  related  to  the  HIV-1 
gag  protein,  when  displayed  in  a  recombinant  phage  virion,  is  capable  of  eliciting  rabbit 
antibodies  that  cross  react  with  the  native  protein  (Minenkova  et  al.,  1993).  On  the  other 
hand,  using  a  library  of  peptides  displayed  on  hybrid  bacteriophage  particles,  a  9-residue 
amino  acid  sequence  can  be  selected  as  a  mimic  of  a  discontinuous  epitope  of  Bortadella 
pertussis  toxin,  but  antibodies  raised  in  mice  against  the  modified  phage  are  unable  to 
recognize  the  natural  antigen  (Felici  et  al.,  1993).  Nonetheless,  it  is  clear  that  in  the  right 
circumstances  peptides  displayed  on  filamentous  bacteriophages  are  capable  of  acting  as 
highly  effective  structural  mimics  of  natural  continuous  epitopes  of  native  proteins. 


PEPTIDE  ACCESSIBILITY 

Peptides  displayed  on  gVIIIp  are  efficient  mediators  of  immunological  reactions  of 
various  kinds,  but  there  is  still  little  knowledge  of  the  factors  that  might  govern  the  physical 
accessibility  of  a  peptide  to  a  target  receptor.  X-ray  fibre  diffraction  analysis  of  recombinant 
phage  with  peptides  incorporated  between  residues  3  and  4  of  the  major  coat  protein  has 
revealed  no  significant  change  in  the  helical  parameters  of  the  protein  subunit  packing  (M.F. 
Symmons,  L.C.  Welsh,  C.  Nave,  D.A.  Marvin  and  R.N.  Perham,  unpublished  work).  Thus 
the  structural  model  of  phage  fd  derived  from  X-ray  fibre  diffraction  (Makowski,  1993; 
Marvin  et  al.,  1 994)  can  serve  as  a  sound  basis  for  interpreting  studies  of  peptide  accessibility. 

A  straightforward  test  of  accessibility  is  to  analyse  the  susceptibility  of  the  peptide 
insert  to  proteolysis  with  defined  proteinases  in  vitro.  Thus,  when  a  recombinant  phage 
particle  with  the  sequence  GPGRAF  inserted  between  residues  3  and  4  in  each  copy  of  gVIIIp 
is  treated  with  trypsin,  there  is  rapid  cleavage  between  the  arginine  and  alanine  residues 
(positions  7  and  8,  respectively,  measured  from  the  N-terminus).  On  the  other  hand,  when 
treated  with  chymotrypsin,  cleavage  after  the  phenylalanine  residue  (position  9)  is  much  less 
facile  (T.D.  Terry  and  R.N.  Perham,  unpublished  work).  This  strongly  suggests  that  a  protein 
receptor  comparable  in  size  to  the  serine  proteinases  {M^  about  25,000)  should  be  able  to 
come  into  intimate  contact  with  peptides  inserted  betweeen  residues  3  and  4  of  gVIIIp,  but 
that  amino  acid  sequences  inserted  further  from  the  N-terminus  will  be  shielded  by  the  bulk 
structure  of  the  virion.  Information  of  this  kind  will  be  valuable  in  the  design  of  future  display 
systems. 


CONCLUSIONS 

The  viral  DNA  is  packaged  inside  the  filamentous  bacteriophage  capsid  by  a  novel 
mechanism  of  electrostatic  charge  matching,  in  which  the  protein  forms  a  sheath  lined  with 
positive  charges  that  neutralize,  without  base  sequence  specificity,  the  negative  charges  of 
the  sugar-phosphate  backbone.  If  the  positive  charge  density  per  unit  length  inside  the  protein 
tube  is  lowered,  the  DNA  is  forced  to  increase  the  length  it  occupies  by  adopting  a  more 
elongated  configuration,  thereby  lowering  its  negative  charge  density  per  unit  length  to 
match  that  of  the  protein.  The  length  of  the  virion  is  thus  dictated  by  the  length  of  the  DNA 
molecule,  but  we  can  manipulate  the  scale  by  which  it  is  read.  This  property  is  the  basis  of 
the  widespread  use  of  the  virus  (Ml  3)  as  a  cloning  and  DNA  sequencing  vector. 

Displaying  foreign  peptides  on  the  surface  of  the  bacteriophage  particle  offers  a 
powerful  means  of  studying  the  immunological  recognition  of  proteins.  The  specificity  of 
the  immune  response,  the  ability  to  recruit  helper  T-cells,  the  lack  of  need  for  external 
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adjuvants,  the  structural  mimicry  of  defined  peptide  epitopes,  and  the  accessibility  of  the 
peptide  inserts  to  analysis  by  means  of  protein  chemical  and  biophysical  techniques,  all 
favour  it  as  a  technique.  It  may  also  prove  to  be  an  inexpensive  and  simple  route  to  the 
production  of  effective  vaccines.  More  work  now  needs  to  be  done  to  determine,  if  possible, 
the  structure  of  a  displayed  peptide  in  comparison  with  the  structure  of  a  natural  epitope, 
and  to  probe  more  fully  the  ways  in  which  T-cell  epitopes  might  similarly  be  displayed  to 
full  advantage. 
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1.  INTRODUCTION 

Within  human  plasma,  apolipoprotein  B  exists  as  two  antigenically-related  isoprote¬ 
ins,  designated  apoB- 1 00  (Mr  5 1 2,000)  and  apoB-48  (Mr  250,000).  The  major  apoB  secreted 
in  vitro  by  normal  human  hepatocytes  and  Hep  G-2  cells  is  apoB-100  (Edge,  et  ah,  1985). 
The  peripheral  metabolism  of  VLDL  and  apoB  in  part  determine  the  level  of  circulating  LDL 
(Dolphin,  1985).  The  LDL,  possesing  apoB- 100  as  the  principal  apolipoprotein  constituent, 
are  the  major  cholesterol  transporting  lipoproteins  in  human  plasma.  Since  both  LDL- 
cholesterol  (Grundy,  1986)  and  apoB- 100  (Brunzell  et  al.,  1984)  levels  are  directly  and 
positively  correlated  with  premature  coronary  artery  heart  disease,  an  understanding  of  the 
control  of  hepatic  apoB- 100  synthesis  and  secretion  is  important.  The  human  apolipoproteins 
have  been  demonstrated  to  undergo  several  co- translational  and  post-translational  modifi¬ 
cations  including  proteolytic  cleavage  (Gordon  et  al.,  1983;  Stoffel  et  al.,  1983;  Zannis  et 
al,  1983;  Bojanovski  et  al.,  1984),  glycosylation  (Swaminathan  and  Aladjem,  1976;  Lee  and 
Breckenridge,  1967;  Brewer  et  al.,  1974;  Zannis  and  Breslow,  1981),  covalent  phospholyra- 
tion  (Beg  et  al.,  1989;  Davis  et  al.,  1984;  Sparks  et  al.,  1988  and  Jackson  et  al,  1990),  fatty 
acid  acylation  (Hoeg  et  al.,  1986  and  Hoeg  et  al.,  1988)  and  deamidation  (Ghisseli  et  al., 
1985).  These  structural  alterations  may  have  important  physiologic  as  well  as  pathologic 
roles. 
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Davis  et  al  (Davis  et  al.,  1984)  initially  suggested  that  apoB  phosphorylation  in  rat 
hepatocytes  may  play  a  role  in  the  intracellular  transport  of  hepatic  VLDL  during  lipid 
assembly  and  secretion.  Recently,  Sparks  et  al  (Sparks  et  al.,  1988)  reported  that  both 
apoB-48  and  apoB-100  were  secreted  as  a  phosphoapolipoprotein  by  primary  cultures  of  rat 
hepatocytes.  Jackson  et  al  have  recently  demonstrated  that  addition  of  insulin  to  rat  hepato¬ 
cytes  decreased  the  phosphorylation  of  apoB-100  with  only  a  small  effect  on  apoB-48 
(Jackson  et  al.,  1990). 

Phophorylation  of  apoB  may  be  an  important  mechanism  for  the  intracellular 
assembly  and  secretion  of  VLDL  as  demonstrated  for  vitellogenin  (Wang  and  Williams, 
1982).  Since  apoB- 100  is  the  principal  apolipoprotein  associated  with  circulating  plasma 
cholesterol  (Osborne  and  Brewer,  1977),  and  recently  human  apolipoprotein  A-1  has  been 
shown  to  undergo  covalent  reversible  phosphorylation  (Beg  et  al.,  1989),  potential  role  of 
covalent  phosphorylation  in  the  newly  secreted  apoB- 100  from  Hep  G-2  cells  as  well  as 
circulating  human  plasma  apoB- 100  was  investigated.  We  have  also  investigated  whether 
alterations  in  intracellular  apoB- 100  turnover  and  phosphorylation  are  associated  with 
changes  in  secretion  of  phosphorylated  apoB- 100  following  cellular  induction  of  protein 
kinase  C  (PKC)  in  Hep  G-2  cells.  The  data  from  these  studies  support  the  concept  that 
phospholipase  C  (PLC)  mediated  activation  of  PKC  and  covalent  phosphorylation  plays  a 
role  in  the  regulation  of  apoB- 1 00  synthesis  and/or  processing,  and  intracellular  transport, 
and  secretion. 


2.  RESULTS 

2.1.  Phosphorylation  and  Dephosphorylation  of  Human  Plasma  apoB-100 

Fig.  1  represents  the  time  course  of  phosphorylation  of  human  plasma  circulating 
apoB- 100  in  LDL,  mediated  by  purified  protein  kinase  C.  Increasing  incorportion  of 
radiolabeled  phosphate  was  observed  with  increasing  time.  Within  60  min.  of  apoB- 1 00:LDL 
(0.04  mg/ml)  incubation  with  protein  kinase  C  at  30°C  was  associated  with  a  stoichiometry 
of  four  mol  phosphate  per  mol  of  apoB- 100  (Fig.l). 

Dephosphorylation  of  maximally  phosphorylated  ^^P-apoB-100  (  4  mol  of  phos¬ 
phate/  mol)  with  hepatic  phosphoprotein  phosphate  I  (Brandt  et  al.,  1975)  was  associated 
with  a  time-dependent  loss  of  ^^P-bound  radioactivity  (Fig.  2  A).  Incubation  of  ^^P-apoB- 1 00 


Figure  1.  In  vitro  phosphorylation  of  human 
plasma  apoB-100:LDL.  Purified  human  plasma 
apoB-100:LDL  (0.04  ng/m)  was  phosphorylated  in 
the  presence  of  purified  rat  brain  portein  kinase  C 
(0.19  mg/ml).  At  the  indicated  time  intervals, 
aliquots  were  analyzed  by  NaDodS04-PAGE,  ^^P- 
apoB- 1 00  bands  were  cut  from  gel  and  analyzed  for 
^^P-bound  radioactivity. 
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Figure  2.  Dehphosphorylation  of  phosphorylated  ^^P-apoB-iOO:LDL.  A,  aliquot  (40|il)  of  maximally  phos- 
phorylated  (30°C,  3  h,  4  mol/mol)  apoB-100:LDL  was  incubated  at  37°C  in  a  total  volume  of  700  pi  in  a  buffer 
containing  50  mM  imidazole,  pH  7.5, 1  mM  EDTA,  2  mM  dithiothreitol,  100  mM  sucrose  and  250  mM  NaCl. 
^^PapoB-100:LDL  samples  (3.1  pg)  were  incubated  with  either  NaF-inactivated  rat  liver  type  I  phosphotase 
or  active  (NaCl-treated  phosphatase  (0.078  mg/ml).  At  the  indicated  time  intervals,  100  pi  aliquots  were  mixed 
with  100  pi  of  buffer  B  and  analyzed  by  NaDodS04-PAGE  for  the  ^^P-bound  apoB-100:LDL.  After  autora¬ 
diography  of  dry  NaDodS04-gel,  ^^P-apoB-100:LDL  bands  were  cut  and  analyzed  for  radioactivity.  B,  40  pi 
(3.1  pg)  of  phosphorylated  ^^P-apoB-  100:LDL  (4  mol  P04/mol)  was  incubated  at  37°C  in  a  total  volume  of 
700  pi  in  a  buffer  containing  25  mM  glycine-  HCL,  pH  9.6  and  50  mM  MgCl2.  Samples  were  either  incubated 
with  inactivated  (boiled)  or  active  alkaline  phosphatase  (1463  units).  At  the  indicated  time  intervals,  100  pi 
aliquots  were  analyzed  by  NaDodS04-PAGE  and  residual  radioactivity  in  ^^PapoB-100:LDL  band  quantified 
as  described  in  a  panel  A. 


with  alkaline  phosphatase  revealed  a  similar  time-dependent  dephosphorylation  (Fig  2B). 
Dephosphorylation  of  ^^P-apoB-100  with  hepatic  phosphatase-I  and  alkaline  phosphatase 
was  associated  with  approximately  85%  loss  of  radioactivity  in  the  apoB-100  band  in  120 
amd  15  min,  respectively  (Fig.  2 A  and  B).  Dephosphorylation  of  purified  human  plasma 
apoB-100:LDL  by  incubating  with  phosphatase  and  subsequent  PKC-mediated  phosphory¬ 
lation  failed  to  exhibit  an  increase  in  the  degree  of  phosphorylation  in  comparison  to  control 
apoB-100  treated  with  inactivated  phosphatase  (data  not  shown).  These  results  suggest  that 
human  plasma  apoB-100  exists  primarily  in  the  dephosphorylated  form. 

2.2.  Isolation  and  Analysis  of  Thrombolytic  ^^P-Peptides  from 
^^P-ApoB-100:LDL 

Human  plasma  apoB-100:LDL  was  maximally  phosphorylated  (4mol  PO4  /mol)  by 
protein  kinase  C  purified  by  ultracentrifugation  and  digested  with  human  thrombin.  The 
digested  apoB-100  was  analyzed  by  NaDodS04-PAGE.  After  18  hr  of  digestion  at  37°C, 
80-90%  of  apoB-100  protein  was  cleaved  into  four  major  peptides-  Ti  (Mr  =  385,000  Da), 
T2  (Mr  =  170,000  Da),  T3  (Mr  =  238,000  Da)  and  T4  peptide  (Mr  =  145,000  Da)  (Fig.  3A, 
lanes  1-3).  These  results  are  consistent  with  the  thrombin  peptides  generated  from  unphos- 
phorylated  LDL  (Fig.  3A,  lanes  4-6)  and  with  the  previously  published  reports  describing 
the  thrombin-mediated  cleavage  of  human  apoB-100  (Cardin  et  al.,  1984  and  Knott  et  al., 
1985).  Autoradiography  of  the  gel  revealed  ^^P-radioactivity  bands  corresponding  to  Tj,  T2, 
T3  and  T4  petides  (Fig.  3B).  Analysis  of  the  protein  bound  radioactivity  and  quantification 
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Figure  3.  Analysis  of  ^“P-peptides  generated  by  thrombin-me- 
Q  dialed  digestion  of  ^^-apoB- 1 00:LDL.  Protein  kinase  C-medi- 

ated  phosphorylated  ^^-apoB-100:LDL  (4  mol  P04/mol)  was 
— purified  by  ultracentriftigation,  digested  with  thrombin  and 
!  J  PH  delipidated.  The  thrombolytic  ^^P-peptides  thus  generated 

;  ,  :  ■  •  were  analyzed  by  NaDodS04-PAGE  as  described  under 

— Apo  B'lOO—  m  ‘Methods’.  The  gel  was  dried  and  autoradiographed.  The 

I  - Ti -  '  stained  bands  of  T  peptides  and  corresponding  radioactive 

" .  .  bands  in  the  autoradiogram  were  quantified  by  scanning,  after 

.  IP  ^  «ili  - T3 -  which  each  T  peptide  bands  were  cut  and  analyzed  for  apoB- 

'  _ T2 _  iiill  100-bound  radioactivity.  A,  NaDodS04gel:  lane  1  is  phospho- 

U;:;:**.  **  ,  T 4  p  W  rylated  ^^P-apoB-100:LDL  (8.5  pg).  Lanes  2  and  3  represent 

* j  ^  4  the  stained  T  peptide  bands  (Tj,  T2,  T3  and  T4)  generated  after 

digestion  (37°C,  1 8  h)  of  ^^P-apoB-  100;LDL  (68  pg)  with  two 
^  ^  ^  ^  ®  12  3  |gyg|g  ^32  and  127  units)  of  thrombin,  respectively.  Lane  4 

depicts  native  (8  pg)  apoB-lOO  band.  Lanes  5  and  6  represent  the  stained  T  peptides  after  hydrolysis  of  cold 
apoB-  100:LDL(64  pg)  with  two  concentrations  (32  and  127  units)  of  thrombin,  respectively.  B,  autoradiogram 
of  NaDodS04  gel  shown  in  A.  Lane  1  shows  the  ^^P-apoB-100  band.  Lanes  2  and  3  represent  the  radiolabeled 
peptides,  Ti,  T2,  T3,  and  T4,  after  digestion  of  ^^P-apoB-100;LDL  with  32  and  127  units  of  thrombin, 
respectively. 


of  protein  content  within  the  band  in  each  peptide  revealed  that  T3  and  T4  peptides  were 
associated  with  approximately  one  mol  of  ^^P-P04  each,  whereas  T2  peptide  was  associated 
with  2  mol  of  phosphate. 

Autoradiography  of  the  thin-layer  cellulose  chromatographic  sheet  following  elec¬ 
trophoresis  of  acid  hydrolysate  of  ^^P-apoB-  100  revealed  radioactivity  only  in  phosphoser- 
ine  band  (Fig.4,  lane  1)  comigrating  with  the  phosphoserine  standard  detected  by  staining 
with  ninhydrin  (Fig.  4,  lane  2).  These  results  demonstrated  that  PKC-mediated  phosphory¬ 
lation  occured  only  on  serine  residues  of  circulating  human  plasma  apoB-100:LDL  (Fig.4). 

2.3.  Secretion  of  Phosphorylated  apoB-100  by  Hep  G-2  Cells 

Incubation  of  Hep  G-2  cells  with  ortho  [^^P] -phosphate  was  associated  with  the 
synthesis  and  secretion  of  radiolabeled  apoB-100:LDL  in  the  media.  Immunoprecipitation 
of  secreted  ^^P-apoB-100  with  a  monospecific  anti-apoB-100  IgG,  and  NaDodS04-PAGE 
of  the  immunoprecipitates  revealed  a  protein  band  which  comigrated  with  the  apoB-100 
standard.  Autoradiography  of  the  gel  revealed  a  radioactive  band  which  comigrated  with  the 


-  P-Ser 

-  P-Thr 

-  P-Tyr 


Figure  4.  Autoradiogram  of  ^^P-phosphoamino  acids  from  acid  hydrolysate 
of  phosphorylated  human  plasma  ^^P-apoB-100:LDL.  Purified  ^^P-apoB- 
lOOiLDL  (4  mol  P04/mol)  was  hydrolyzed  by  HCl.  The  hydrolysate  was 
analyzed  for  ^^P-phosphoamino  acids  and  autoradiogram  was  prepared.  Lane 
1  represents  the  migration  of  ^^P-phosphoserine.  The  migrations  of  the  stan¬ 
dard  phosphoserine  (P-Ser),  phosphothreonine  (P-Thr),  and  phosphotyrosine 
(P-tyr)  are  indicated  in  lane  2.  Pi  denotes  free  ^^P-phosphate. 


1 
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Figure  5.  Analysis  of  media  ^^P-apoB-100:LDL  secreted  by  Hep  G-2 
cells  incubated  in  the  absence  (control)  and  presence  of  PLC.  After 
incubation  of  Hep  G-2  cells  with  and  without  PLC  for  5  h,  media  were 
collected  and  secreted  ^^P-apoB-  100:LDL  in  each  group  was  purified  by 
ultracentrifiigation.  Aliquots  (2.1  pg  control;  1.6  pg  PLC-treated)  of 
^^P-apoB-100:LDL  wre  analyzed  by  NaDodS04-  PAGE.  The  gel  was 
dried,  autoradiogram  was  prepared,  after  which  ^^P-apoB-100  bands 
were  cut  and  quantified  for  radioactivity.  Lanes  1  and  2  depict  the 
secreted  radiolabeled  apoB-100  bands  from  control  and  PLC-treated 
cells,  respectively.  Lane  3  represents  the  band  of  in  vitro  phosphorylated 
human  plasma  ^^P-apoB-100:LDL. 


1  2  3 


stained  protein  band  and  radioactive  band  of  plasma  phosphorylated  apoB-100,  similar  to 
Fig.5,  lane  1. 

Purification  of  media  ^^P-apoB-100:LDL  secreted  over  a  period  of  5  h,  by  two 
successive  ultracentrifugation  and  analysis  by  NaDodS04-PAGE  demonstrated  that  newly 
secreted  apoB-100  from  Hep  G-2  cells  was  phosphorylated  and  comigrated  with  human 
plasma  in  vitro  phosphorylated  ^^P-apoB-100.  Autoradiography  of  the  NaDodS04  gel 
revealed  phosphorylation  and  co-migration  of  purified  media  ^^P-apoB-100  (Fig.5,  lane  1) 
with  plasma  ^^P-apoB-  100  (Fig.  5,  lane  3).  Analysis  of  the  immunoprecipitated  cellular 
^^P-apoB- 1 00  of  Hep  G-2  cells  by  NaDodS04-PAGE  and  autoradiography  of  the  gel  revealed 
that  newly  synthesized  intracellular  apoB-100  was  phosphorylated,  consistent  with  the 
phosphorylation  of  secreted  media  apoB-100  (Table  I). 

Thrombin  digestion  of  secreted  ^^P-apoB-100,  isolated  by  ultracentrifugation 
(above),  followed  by  NaDodS04-PAGE  and  autoradiography  of  the  gel  demonstrated  the 
presence  of  protein  bound  ^^P  radioactivity  in  Tj,  T2,  T3  and  T4  peptides,  similar  to  shown 
in  Fig.  3  for  in  vitro  phosphorylated  plasma  apoB-100  (data  not  shown). 

Phosphoamino  acid  analysis  of  secreted  ^^P-apoB-100  by  Hep  G-2  cells  revealed 
phosphorylation  only  on  the  serine  residue  (data  not  shown).  These  results  are  identical  to 


Table  I.  Effect  of  protein  kinase  C  activation  on  the  phosphorylation 
and  secretion  of  apo  B-100:LDL  by  Hep  G-2  cells 


Incubation 

(min) 

Concentration 

munits/ml 

^^P~Apo  B- 1 00  Scanning  Units 
(%  of  control) 

Medium 

Cellular 

30 

0 

100 

100 

30 

10 

235 

196 

60 

0 

100 

100 

60 

10 

52 

60 

300 

0 

100 

100 

300 

10 

55 

94 

Confluent  Hep  G-2  cells  pulsed  with  ortho[^^P]-phosphate  for  3  h  were 
incubated  with  (10  munits/ml)  and  without  phospholipase  C  for  30,  60 
and  300  minutes,  after  which  media  and  cellular  f^P]apo  B-lOO  were 
immunoprecipitated  with  a  monospecific  rabbit  anti-human  apo  B-lOO 
IgG.  The  analysis  of  immunoprecipitates  by  NaDodS04-PAGE, 
autoradiography  of  the  gel,  and  scanning  of  the  autoradiogram  were 
carried  out. 
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Figure  6.  Hepatic  phosphatase  I  mediated  dephosphorylation  of  phospho- 
rylated  ^^P-apoB-100:LDL  secreted  by  control  and  PLC-  treated  Hep  G-2 
cells.  Secreted  ^^P-apoB-100:LDL  from  untreated  control  and  PLC-treated 
Hep  G-2  cells  (5  h)  were  purified  and  aliquots  (2.1  pg,  control;  1.2  pg, 
PLC-treated)  were  incubated  at  37°C  for  16  h,  either  with  NaF-treated 
inactive  phosphatase-I  or  NaCi-treated  active  phosphatase  I  (1.29  mg/ml). 
The  samples  were  analyzed  by  NaDodS04-PAGE  and  autoradiogram  of  the 
gel  was  prepared.  Lanes  1  and  2  show  autoradiogram  of  control  (inactive 
phosphatase)  and  dephosphorylated  ^^P-apoB-100  respectively,  secreted  by 
untreated  control  cells  whereas  lanes  3  and  4  in  the  autoradiogram  represent 
the  inactive  phosphatase-treated  and  active  phosphatase  treated  (dephospho¬ 
rylated)  ^^P-apoB-100:LDL  from  PLC-treated  cells,  respectively.  Lane  5  is 
in  vitro  phosphorylated  human  plasma  ^^P-apoB-100  band. 


and  consistent  with  the  in  vitro  phosphorylation  of  human  plasma  apoB-100  mediated  by 
PKC  as  depicted  in  Fig.  4. 

Incubation  of  purified  secreted  ^^P-apoB-100:LDL  with  hepatic  phosphatase-I  from 
Hep  G-2  cells  resulted  in  the  loss  of  >95%  of  the  ^^P-radioactivity  in  the  apoB-100  band 
when  compared  to  the  controls  containing  inactive  phosphatase  (Fig.  6,  lanes  1  and  2). 

2.4.  Phospholipase  C-Mediated  Induction  of  Protein  Kinase  C  and  the 
Phosphorylation  of  apoB-100  in  Hep  G-2  Cells 

In  order  to  demonstrate  the  physiological  relevance  of  PKC-mediated  in  vitro 
phosphorylation,  we  initiated  a  series  of  studies  to  evaluate  in  vivo  phospholipase 
C-mediated  activation  of  PKC  and  the  potential  impact  of  increased  phosphorylation 
on  cellular  and  secreted  apoB-100  in  Hep  G-2  cells.  Cultures  were  exposed  to  control 
media  or  media  containing  10  munits/ml  of  phospholipase  C  for  various  time  intervals. 
At  the  end  of  each  incubation,  cells  were  harvested  and  seperated  into  membrane  and 
cytosol  fractions  to  determine  PKC  activity.  Exposure  of  cells  to  phospholipase  C 
resulted  in  a  transient  2.5  fold  increase  in  membrane  associated  PKC  activity  which 
reached  a  maximum  after  15  min  then  declined  (Fig.  7).  In  order  to  evaluate  the 
effects  of  PLC  mediated  induction  of  PKC  on  phosphorylation  and  secretion  of  apoB- 
100,  Hep  G-2  cells  were  pulsed  with  ortho  [^^P]-P04  for  3  hrs.,  then  incubated  with 
and  without  PLC  for  30,  60  and  300  minutes.  After  30  min  incubation  of  Hep  G-2 


Figure  7.  Activation  of  protein  kinase  C  by  phospholipase  C. 
Hep  G-  2  cells  were  incubated  for  1 8  h  in  DMEM  and  then 
exposed  to  10  munits/ml  of  PLC  for  indicated  times  at  30°C. 
Following  incubation  membrane  and  cytosol  fractions  were 
prepared.  The  solubilized  membrane  fractions  were  subjected 
to  DEAE  chromatography,  and  fractions  eluted  with  200  mM 
NaCl  were  assayed  for  PKC  activity.  Results  are  expressed  as 
percent  of  membrane  associated  activity  compared  with  con¬ 
trols.  Each  point  represents  the  average  of  duplicate  assays. 
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Table  II.  Specific  activity  of  purified  media  [^^PJ-labeled 
apo  B-100:LDL  secreted  from  control  and 
phospholipase  C  treated  Hep  G-2  cells 


Apo  B-lOO  specific  activity 

%of 

Treatment 

(dpm/pg  protein) 

control  Ratio^ 

Control 

303 

100  1.9 

Phospholipase  C 

578 

191 

Hep  G-2  cells  were  pulsed  with  ortho[^^P]  phosphate  for  3  h  then 
incubated  for  5  h  with  and  without  phospholipase  C.  After  which  media 
radiolabeled  apoB-100:LDL  was  purified  by  two  sequential 
ultracentrifugation.  The  purified  [^^P]-apoB-100:LDL  was  dialyzed, 
concentrated  and  analyzed  for  protein  and  apo  B-lOO  radioactivity. 
®Ratio  of  the  specific  activity  of  ^^P-apo  B-lOO  secreted  by 
phospholipase  C-treated  to  that  secreted  by  untreated  control  Hep  G-2 
cells. 


cells  with  PLC,  a  significant  increase  in  the  degree  of  phosphorylation  of  both  newly 
synthesized  cellular  (196%)  and  secreted  (235%)  ^^P-apoB-100  was  evident  when 
compared  to  untreated  controls  (100%,  Table  I).  After  60  min  of  incubation,  both 
media  and  cellular  phosphorylated  ^^P-apoB-100  in  PLC-treated  cells  were  significantly 
declined  to  48%  and  40%  respectively,  when  compared  to  respective  controls  (Table 
I).  This  decrease  in  the  ^^P-apoB-100  appears  to  be  due  to  activated  PKC-mediated 
enhanced  phosphorylation  and  increased  degradation  of  ^^P-apoB-100  in  PLC-treated 
cells.  At  the  end  of  a  5  hr  incubation  of  PLC  the  cellular  ^^P-apoB-100  content  was 
restored  to  control  levels  due  to  apparent  loss  of  PLC-mediated  stimulation  of  PKC. 
However,  the  net  secretion  of  ^^P-apoB-100  during  5  h  incubation  of  Hep  G-2  cells 
with  PLC  remained  45%  inhibited  (Table  I).  Surprisingly  the  specific  activity  (dpm/pg 
protein)  of  secreted  ^^P-apoB-100  was  significantly  increased  (Table  H).  the  data 
demonstrated  a  191%  increase  in  the  specific  activity  of  purified  secreted  phospho¬ 
rylated  apoB-100  in  Hep  G-2  cells  treated  with  PLC  for  5  h  in  comparison  to  untreated 
control  (100%,  Table  H).  These  results  are  consistent  with  increased  phosphorylation, 
increased  intracellular  degradation  and  decreased  secretion  of  media  ^^P-apoB-100  in 
PKC-stimulated  Hep  G-2  cells. 

SDS-PAGE  of  thrombinized  secreted  ^^P-apoB-100:LDL  from  control  and  PLC- 
treated  Hep  G-2  cells  revealed  hydrolysis  of  ^^P-apoB-lOOiLDL  into  T  peptides,  similar 
to  shown  in  Fig.  3.  Thrombinization  and  SDS-PAGE  of  secreted  ^^P-apoB-100:LDL  from 
control  and  PLC-treated  Hep  G-2  cells  revealed  an  identical  stained  peptide  pattern  as 
in  human  plasma  apoB-100:LDL  similar  shown  in  Fig.  3  A.  The  autoradiogram  of  the 
above  gel  demonstrated  that  both  control  and  PLC-treated  T  peptides  of  ^^P-apoB-100 
demonstrated  a  pattern  similar  to  ^^P-  peptides  obtained  from  the  thrombinization  of 
plasma  in  vitro  phosphorylated  ^^P-apoB-100  (Fig.3B).  Quantification  of  protein  and 
^^P-radioactivity  in  T  peptides  were  consistent  with  the  incorporation  of  one  mol  of 
phosphate  each  in  T3  and  T4  peptides  where  as  two  mol  of  phosphate  in  T2  peptide  as 
seen  in  T  peptide  mapping  of  plasma  ^^P-apoB-100,  except  higher  radioactive  counts 
were  present  in  T  peptides  derived  from  ^^P-apoB-100  of  PLC-treated  Hep  G-2  cells 
(Table  H). 

Phosphoamino  acid  analysis  of  PLC-treated  media  ^^P-apoB-100  showed  phospho¬ 
rylation  only  on  serine  residues,  similar  to  untreated  controls,  consistent  with  data  presented 
in  Fig.4. 
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2.5.  Pulse-Chase  Studies  on  the  Effect  of  PLC  on  the  Intracellular 
Degradation  and  Secretion  of  ^^S-apoB-100 

In  order  to  confirm  that  increased  PKC-mediated  phosphorylation  of  apoB-100 
caused  increased  intracellular  degradation  and  decreased  secretion,  Hep  G-2  cells  were 
pulsed  with  ^^S-methionine:^^S-cysteine  and  chased  in  the  presence  or  absence  of  PLC.  PKC 
activation  in  PLC  incubated  cells  induced  a  significant  increase  in  the  intracellular  degrada¬ 
tion  of  apoB-  100,  reaching  a  maximum  (70%)  begining  at  60  min.  into  the  chase  in 
comparison  to  untreated  cells  with  no  decline  at  30  min.  (Fig.  8  A).  Secretion  of  radiolabeled 
apoB-100:LDL  from  the  cells  which  were  chased  in  the  presence  of  PLC  was  significantly 
reduced.  At  90,  120  and  180  min  of  the  chase  period,  a  decline  of  approximately  50%  was 
observed  (Fig.SB).  The  data  in  Fig. 8  A  and  B  suggested  that  enhanced  degradation  coupled 
with  reduced  secretion  of  ^^S-apoB-100  in  PLC  treated  cells  may  be  due  to  increased 
phosphorylation. 


3.  DISCUSSION 

The  apolipoproteins  as  constituents  of  lipoproteins  particles  are  known  to  interact 
noncovalently  with  lipids  and  play  a  pivotal  role  in  directing  the  metabolism  of  the 
lipoprotein  transport  system  (Gofman  et  al.,  1 954  and  Brewer,  1981),  Recent  reports  indicate 
that  human  plasma  apoA-I  (HDL)  as  well  as  cellular  and  secreted  apoA-LHDL  of  Hep  G-2 
cells  undergo  post-translational  modification  involving  reversible  phosphorylation  (Beg  et 
al.,  1 989).  In  the  present  report  we  describe  the  initial  description  of  in  vitro  phosphorylation 


CHASE  TIME  (min) 


Figure  8.  Effect  of  protein  kinase  C  activation  on  intracellular  degradation  and  secretion  of  ^^S-apoB- 
lOOiLDL.  Hep  G-2  cells  were  pulsed  (15  min)  with  ^^S-meth:^^S-cysteine,  incubated  with  and  without  PLC 
and  chased  for  15,30,60,90,120  and  180  min.  After  the  indicated  time,  cellular  and  secreted  apoB-100:LDL 
immunoprecipates  were  subjected  to  NaDodS04-PAGE  and  autoradiogram  of  the  dry  gel  was  prepared.  The 
stained  protein  and  radioactive  bands  corresponding  to  apoB-100  were  quantified  by  densitometric  scanning, 
after  which  ^^S-apoB-100  bands  were  cut  and  analyzed  for  radioactivity.  Open  circles,  the  control  cells;  closed 
circles,  the  PLC-treated  cells.  A,  intracellular  ^^S-apoB-100  bands  were  quantified  by  densitometric  scanning. 
Area  of  the  peak  at  1 5  min.  into  the  chase  is  taken  as  100%  B,  apoB- 1 00:LDL  secreted  in  the  medium.  Ordinate, 
^^S-apoB-100  content  expressed  as  dpm  per  miligram  scanning  units  of  protein.  Abscissa  represents  the  time 
after  initiation  of  the  cold  methionine  chase.  Each  point  represents  the  average  of  duplicate  assays. 
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and  dephosphorylation  of  circulating  human  plasma  apoB-100:LDL.  In  vitro  phosphoryla¬ 
tion  of  human  apoB-100:LDL  was  mediated  by  a  purified  rat  brain  protein  kinase  C  with  a 
stoichiometry  of  four  mol  of  phosphate  per  mol  of  apoB-100,  assuming  a  molecular  mass 
of  512  KDa  for  apoB.  Thrombolytic  digestion  of  maximally  phosphorylated  ^^P-apoB- 
100:LDL  demonstrated  that  T|  (T3  and  T4)  and  T2  fractions  are  phosphorylated.  T2  fraction 
has  been  shown  to  be  generated  by  thrombin  cleavage  at  lysine  residue  3,249  from  the 
amino-terminal  end  of  apoB-100:LDL  (Cardin  et  aL,  1984  and  Knott  et  ah,  1985).  Analysis 
of  radioactivity  in  each  thrombolytic  peptides  revealed  that  T ]  and  T2  peptides  were 
associated  with  two  mol  of  phosphate  each,  whereas  T3  and  T4  peptides  generated  by 
cleavage  of  T,  peptide  (Cardin  et  al.,  1984  and  Knott  et  al.,  1985)  contained  one  mol  of 
phosphate  each.  Phosphoamino  acid  analysis  of  phosphorylated  ^^P-apoB-1 00  demonstrated 
that  all  sites  of  phosphorylation  are  associated  with  serine  residues  only.  These  results 
established  that  PKC-mediated  phosphorylation  of  human  plasma  apoB-100  is  specific  to 
serine  residues. 

Dephosphorylation  of  ^^P-apoB-100  (4  mol  phosphate  per  mol)  with  either  hepatic 
phosphatase-I  or  alkaline  phosphatase  resulted  into  virtually  complete  loss  of  esterified 
^^P-phosphate.  These  results  suggested  the  possible  in  vivo  role  of  phosphorylation  and 
dephosphorylation  in  the  regulation  of  apoB-100:LDL  synthesis,  processing  and  secretion. 
The  degree  of  phosphorylation  of  purified  apoB-100  after  in  vitro  dephosphorylation  was 
similar  to  the  phosphorylation  of  non-dephosphorylated  apoB-100,  consistent  with  the 
concept  that  circulating  plasma  apoB-100  is  dephosphorylated. 

The  physiological  relevance  of  in  vitro  protein  kinase  C-mediated  phosphorylation 
and  phosphoprotein  phosphatase-  dependent  dephosphorylation  of  human  plasma  apoB- 
100:LDL  was  confirmed  in  vivo  in  human  hepatoma  cells,  Hep  G-2.  ApoB-100  is  shown  to 
be  secreted  as  a  phosphoapolipoprotein.  Incorporation  of  radiolabeled  phosphate;  removal 
of  ^^P04  upon  dephosphorylation  with  phosphatases;  immunoprecipitation  of  phosphory¬ 
lated  apoB-100  with  a  monospecific  human  apoB-100  antibody;  isolation  of  secreted 
^^P-apoB- 1 00:LDL  by  ultracentrifugation  and  immunoblotting  established  that  both  cellular 
and  secreted  apoB-  100  of  Hep  G-2  cells  were  modified  post-translationally  by  covalent 
phosphorylation.  Thrombolytic  digestion  of  secreted  ^^P-apoB-100:LDL  revealed  the  phos¬ 
phorylation  of  Ti,  (T3  and  T4)  and  T2  peptides.  Although  it  is  not  possible  to  quantify  the 
stoichiometry  of  extracellular  phosphorylated  ^^P-apoB-100  from  Hep  G-2  cells,  however, 
based  on  the  level  of  incorporation  of  ^^P04  and  protein  mass  in  each  peptide  suggest  that 
in  vivo  phosphorylation  of  apoB- 100  appears  to  be  similar  to  the  protein  kinase  C-  mediated 
in  vitro  phosphorylation  of  circulating  human  plasma  apoB- 100.  Phosphoamino  acid  analy¬ 
sis  of  the  secreted  ^^P-apo-B-  100  demonstrated  the  phosphorylation  of  only  serine  residues 
similar  to  the  in  vitro  phosphorylated  apoB- 100.  These  results  established  that  phosphory¬ 
lation  of  human  apoB- 100  in  vitro  and  in  vivo  (Hep  G-2  cells),  is  associated  with  specfic 
serine  residues.  These  results  are  consistent  with  previous  reports  showing  that  in  rat 
hepatocytes  both  apoB-48  (Davis  et  al.,  1984  and  Sparks  et  al.,  1988)  and  apoB- 100  (Sparks 
et  al.,  1988)  are  secreted  as  a  phosphoprotein  with  the  radiolabel  on  serine  residues  (Davis 
et  al.,  1984  and  Sparks  et  al.,  1988).  Currently,  the  sites  of  apoB-100  phosphorylation 
remained  undefined,  largely  because  of  problems  in  working  with  apoB- 100,  such  as  self 
association  and  unique  hydrophobic  nature  of  this  large  (512  KDa)  glycosylated  apolipopro- 
tein.  However,  the  data  presented  in  this  manuscript  indicate  the  possibility  of  four  sites  of 
phosphorylation  in  apoB- 100  molecule. 

We  investigated  the  possibility  of  a  PKC-mediated  phosphorylation  of  apoB- 100  in 
Hep  G-2  cells  by  stimulating  the  cellular  PKC  activity  with  phospholipase  C.  We  have  also 
examined  the  regulatory  function  of  enhanced  apoB- 1 00  phosphorylation  on  apoB  synthesis, 
degradation  and  secretion.  PLC  has  also  been  shown  to  stimulate  PKC  by  raising  intracellular 
concentrations  of  diacylglycerol  (Allan  et  al.,  1978  and  Kaibuchi  et  al.,  1983),  which  is 
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generated  as  a  result  of  hydrolysis  of  phosphatidylinositol.  The  results  presented  in  this 
manuscript  established  that  the  increase  in  the  protein  kinase  C  activity  (2.5  fold)  observed 
within  1 5  minutes  of  PLC  challenge  to  Hep  G-2  cells,  caused  two-fold  increase  in  the  level 
of  phosphorylation  of  both  cellular  and  secreted  apoB-  100  within  30  min  of  incubation. 
With  increasing  time  (60  min)  of  incubation  with  PLC,  both  cellular  and  secreted  phospho- 
rylated  ^^P-apoB-100:LDL  showed  a  decrease  of  40%  and  48%,  respectively,  in  comparison 
to  untreated  control  cells  (Table  I).  After  5  h  of  incubation  the  net  secreted  ^^P-apoB-100 
remain  inhibited  (45%)  but  with  a  two-fold  higher  specfic  activity  (dpm/pg  apoB-100 
protein,  Table  II),  whereas  cellular  ^^P-apoB-100  returned  to  basal  level  similar  to  untreated 
control,  apparently  because  of  the  concomitant  decline  in  PKC  activity  and  level  of  cellular 
apoB-100  phosphorylation  (Table  I).  These  results  are  consistent  with  increased  degradation 
and  decreased  secretion  because  of  increased  phosphorylation  of  apoB-100:LDL  in  PKC- 
stimulated  Hep  G-2  cells  when  compared  to  control  cells.  Phosphatase  I-mediated  dephos¬ 
phorylation  of  secreted  ^^P-apoB-100  from  control  and  PLC-  treated  Hep  G-2  cells  revealed 
a  virtual  loss  of  ^^P  radioactivity  in  apoB-100  band  in  comparison  to  control  samples  treated 
with  inactivated  phosphatase  I.  In  order  to  confirm  the  enhanced  degradation  and  reduced 
secretion  of  phosphorylated  apoB-100:LDL,  we  conducted  pulse-chase  experiments  to 
monitor  the  intracellular  ^^S-labeled  apoB-1 00  turnover  and  secretion.  The  data  is  consistent 
with  increased  intracellular  degradation  of  newly  synthesized  apoB-100,  because  of  in¬ 
creased  phosphorylation  in  PKC  stimulated  Hep  G-2  cells.  Enhanced  phosphorylation  and 
intracellular  degradation  of  apoB-100  were  coupled  with  significantly  reduced  secretion  of 
newly  synthesized  apoB-100  during  chase  period  of  cells  exposed  to  PLC. 

In  this  report  we  have  demonstrated  the  phosphorylation-  dephosphorylation  of 
circulating  plasma  apoB-100  as  well  as  secreted  apoB-100  from  control  and  PLC-treated 
Hep  G-2  cells.  Thus,  apoB-100  is  one  of  several. proteins  whose  phosphorylation/dephos¬ 
phorylation  state  is  affected  by  protein  kinases  and  phosphatases.  Recently,  Capasso  et  al 
(Capasso  et  al.,1989)  have  demonstrated  that  intact  rat  liver  Golgi  vesicles  translocate  ATP 
into  their  cisternal  space  and  used  it  to  phosphorylate  a  set  of  secretory  proteins.  Enhanced 
degradation  of  phosphorylated  apoB-100  in  Hep  G-2  cells  is  consistent  with  several  other 
phosphorylated  enzymes  and  proteins  known  to  be  degraded  at  a  faster  rate  than  the 
dephosphorylated  form  (Engstrom  et  al.,  1982;  Muller  and  Holzer,  1981  and  Pontremoli  et 
al.,  1987)  including  HMG-CoA  reductase  (Parker  et  al.,  1 984  and  Parker  et  al.,  1989),  which 
is  a  transmembrane  protein  bound  to  endoplasmic  reticulum.  The  mechanism  for  protease 
activation  and  it’s  role  in  subsequent  enhanced  degradation  of  phosphorylated  apoB-100  in 
Hep  G-2  cells  remains  to  be  eluciated.  However,  it  is  intriguing  to  postulate  that  Ca^'^-sig- 
naled  protein  kinases  such  as  protein  kinase  C  and  Ca^‘*‘/calmodulin-dependent  kinase,  which 
phosphorylate  apoB-1 00  and  also  activate  Ca^'^-dependent  endoproteases,  may  act  in  concert 
in  the  differential  degradation  of  phosphorylated  and  less  phosphorylated  forms  of  apoB- 
1 00.  In  a  recent  report  a  role  for  a  Ca^'^-evoked  or  Ca^'^-dependent  hormones  in  the  regulation 
of  apoB  secretion  in  Caco-2  cells  has  been  suggested  (Hughes  et  al.,  1988).  lonophore-in- 
duced  increased  calcium  ion  availability,  may  be  hypothesized  to  influence  the  phosphory¬ 
lation  state  of  apoB-100:LDL  by  affecting  the  activities  of  Ca^"^  /calmodulin-dependent 
kinase  (Hughes  et  al.,  1 988  and  Cohen,  1985).  Indeed,  we  have  demonstrated  at  least  in  vitro, 
a  Ca^^-calmodulin-dependent  protein  kinase-mediated  phosphorylation  of  human  plasma 
apoB- 100,  as  well  as  in  vitro  and  in  vivo  phosphorylation  of  human  plasma  apoA-I  (Beg  et 
al.,  1989). 

A  potential  important  physiological  role  for  PKC-regulated  changes  in  the  phospho¬ 
rylation  of  apoB-100:LDL  could  be  to  regulate  VLDL  particle  size.  Powell  and  Glenrey 
(Powell  and  Glenrey,  1987)  have  demonstrated  an  increased  affinity  of  dephosphorylated 
calpactin  (lipocortin  1)  for  phosphatidylserine  liposomes  compared  to  the  phosphorylated 
form.  Thus,  increased  phosphorylation  of  intracellular  apoB-100:LDL  could  result  in  de- 
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creased  lipid  association  and  VLDL  particle  size.  Since  human  liver  secretes  primarily 
apoB-100:LDL,  our  demonstration  of  the  PKC  effects  on  apoB-100:LDL  phosphorylation 
and  turnover  may  be  relevant  to  human  apoB  metabolism.  Recently,  it  has  been  demonstrated 
that  in  cultured  human  fibroblasts,  activation  of  PKC  following  binding  of  HDL  (apoA-I  and 
apoA-II)  to  the  HDL  receptor  is  involved  in  the  HDL-mediated  translocation  and  efflux  of 
intracellular  cholesterol  (Mendez  et  ah,  1991).  In  summary,  our  results  establishes  a  role  of 
signal  transduction  through  protein  kinase  C-mediated  enhanced  phosphorylation  of  cellular 
apoB- 1 00:LDL,  which  is  degraded  at  a  faster  rate,  thus  causing  reduced  secretion  and  in  turn 
potentially  a  reduced  apoB-100:LDL  level  in  plasma.  The  site  and  mechanism  of  phospho- 
rylated  apoB-  lOOiLDL  degradation  remains  to  be  established.  It  has  been  well  established 
that  elevated  apoB- 1 00  and  LDL,  the  major  cholesterol  carrying  lipoprotein  in  human  plasma 
has  been  directly  and  positively  correlated  with  premature  coronary  heart  disease.  Therefore, 
mechanism(s)  which  may  reduce  plasma  apoB-100:LDL  would  be  beneficial  in  the  preven¬ 
tion  of  premature  coronary  heart  disease. 
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INTRODUCTION 

Protein  kinases  play  critical  roles  in  the  regulation  of  cellular  processes.  They  control 
many  of  the  pathways  leading  to  the  biochemical  and  morphological  changes  associated  with 
cellular  growth  and  division  (Dunphy  and  Newport,  1988;  Morgan  et  al.,  1989).  They  also 
serve  as  growth  factor  receptors  and  signal  transducers  and  have  been  implicated  in  cellular 
transformation  and  malignancy  (reviewed  by  Hunter  and  Karin,  1992;  Posada  and  Cooper, 
1992;  Birchmeier  et  aL,  1993). 

While  protein  kinases  vary  widely  in  their  primary  structures,  each  contains  a 
catalytic  domain  of  250-300  amino  acids,  consisting  of  11  highly  conserved  motifs  or 
subdomains  separated  by  areas  of  reduced  conservation  (Hanks  et  al.,  1988).  The  presence 
of  these  motifs  within  a  new  sequence  is  strongly  predictive  of  protein  kinase  activity. 
Specificity  of  a  protein  kinase  can  also  be  predicted  by  the  sequence  of  two  of  the  motifs 
(VIB  and  VIII)  in  which  different  residues  are  conserved  in  either  the  tyrosine  or  ser¬ 
ine/threonine  specific  kinases  (Hunter,  1991).  Within  the  two  subgroups,  protein  kinases 
with  similar  substrates  or  modes  of  activation  cluster  into  families  (Hanks  et  aL,  1 988)  whose 
members  share  a  higher  degree  of  catalytic  domain  sequence  identity  with  each  other  than 
with  the  entire  protein  kinase  specificity  class.  Recently,  several  protein  kinases  with 
specificity  for  both  tyrosine  and  serine/threonine  have  been  reported  (eg.  Featherstone  and 
Russell,  1991;  Stem  et  aL,  1991;  Ben-David  et  aL,  1991),  however,  they  do  not  form  a 
distinct  family  grouping  on  the  basis  of  catalytic  domain  sequence  identity. 

Most  protein  kinase  family  members  also  share  other  structural  features  which  reflect 
their  particular  cellular  roles.  These  include  regulatory  domains  that  control  protein  kinase 
activity  or  interaction  with  other  proteins  (Hanks,  1991).  Two  regulatory  elements,  originally 
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identified  as  conserved  sequences  in  members  of  the  src  related  kinase  family,  are  the  src 
homology  2  (SH2)  and  3  (SH3)  domains  (Mayer  e/ a/.,  1988;  reviewed  by  Koch  e/ a/.,  1991). 
These  domains  have  been  found  in  a  variety  of  proteins  in  intracellular  signalling  pathways 
where  they  link  activated  cell  surface  receptors  to  the  G-protein  signalling  pathway  (re¬ 
viewed  by  Pawson  and  Gish,  1992).  Furthermore,  it  has  recently  been  shown  that  SH3 
domains  also  participate  in  regulating  the  activity  of  GTPase  effector  proteins  (Gout  et  al., 
1993)  which  are  vital  for  the  passage  of  G-protein  signals. 

A  second  type  of  regulatory  domain,  which  is  usually  found  in  transcription  factors 
such  as  the  oncogenes  fos,jun  and  myc,  is  the  “leucine  zipper”  (Landschultz  et  al.,  1988). 
In  this  motif,  leucine  or  isoleucine  residues  are  repeated  with  a  heptad  periodicity  over  a 
stretch  of  at  least  22  amino  acids.  This  sequence,  which  has  a  higher  than  average  content 
of  charged  amino  acids,  is  postulated  to  form  a  helix  with  a  hydrophobic  “stripe”  or  ridge 
of  leucines  down  one  face.  In  the  transcription  factors,  the  zipper  motif  is  preceded  by  a 
stretch  of  basic  residues  that  constitute  the  DNA  binding  region.  There  is  strong  evidence 
that  zipper  motifs  promote  dimerization  (O’Shea  et  al,  1991)  through  hydrophobic  interac¬ 
tions  between  heptad  leucines.  Such  dimerization  appears  to  activate  DNA  binding  by 
orientating  the  basic  side  chains  of  the  DNA  binding  residues  to  enable  correct  contact  with 
the  DNA  (Vinson  et  al.,  1989). 

We  have  recently  identified,  in  human  epithelial  tumour  cells,  two  members  of  a  new 
family  of  protein  kinases  that  are  unique  in  having  a  SH3  domain  as  well  as  a  novel  double 
leucine  zipper  and  basic  domain  within  their  sequences.  In  addition,  their  catalytic  domain 
structures  display  similarity  to  both  the  tyrosine  and  serine/threonine  specific  kinases  and 
are  related  to  many  of  the  oncogenic  protein  kinases.  Because  of  this  unusual  mixture  of 
domain  structures,  the  kinases  have  been  named  ’mixed  lineage  kinases’  (MLK’s).  The 
sequence  of  a  third  member  of  the  MLK  family  has  now  been  published  by  others  (Ezoe  et 
al.,  1994;  Gallo  et  al.,  1994;  Ing  et  al.,  1994).  In  the  present  report  we  will  discuss 
characterisation  and  structure  of  this  new  group  of  cellular  control  molecules. 


CHARACTERISATION  OF  THE  MLK’s 

Isolation,  Expression  and  Chromosomal  Localisation 

A  cDNA  fragment  of  MLKl  was  first  isolated  from  human  squamous  epithelial 
carcinoma  cell  line  mRNA  by  reverse  transcriptase  PCR.  Clones  for  human  MLKl  and 
MLK2  were  then  isolated  from  colonic  epithelial  cDNA  libraries.  The  cDNA  sequence 
determined  for  MLKl  codes  for  394  amino  acids  containing  a  kinase  catalytic  domain,  two 
leucine  zippers  and  a  basic  domain  and  a  short  C-terminal  peptide  (Dorow  et  al.,  1993).  A 
full-length  MLK2  clone  has  recently  been  isolated  from  a  human  brain  cDNA  library.  As 
well  as  the  kinase  eatalytic,  double  leucine  zipper  and  basic  domains  homologous  to  those 
of  MLKl ,  the  MLK2  clone  encodes  an  N-terminal  SH3  domain  and  an  extended  C-terminal 
tail  rich  in  proline  and  serine  (see  Figure  1).  A  human  MLKl  genomic  fragment,  isolated  by 
hybridisation  to  a  cDNA  probe  from  the  MLK2-SH3  domain  coding  region,  is  at  present 
being  sequenced.  In  Northern  analysis  of  more  than  thirty  carcinoma  cell  lir 's,  mRNAs  for 
human  MLK’s  1  and  2  were  shown  to  be  expressed  at  low  levels  in  epithelial  cell  lines  of 
breast  and  colonic  origin  and  in  some  melanoma  cell  lines  (data  not  shown).  In  human  tissues, 
MLK  1  and  2  mRNA  expression  was  highest  in  brain  and  skeletal  muscle  tissue.  Other  tissues 
tested,  including  heart,  placenta,  lung,  liver,  kidney  and  pancreas,  showed  extremely  low 
levels  of  MLK2-mRNA  expression  and  MLKl -mRNA  was  undetectable  (data  not  shown). 

We  have  also  isolated  and  sequenced  MLKl  clones  from  a  murine  brain  cDNA 
library.  These  clones  code  for  the  kinase  catalytic,  double  leucine  zipper  and  basic  domains 
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Figure  1.  Schematic  representation  of  the  arrangement  of  structural  domains  of  the  MLK  proteins.  Designa¬ 
tions  are  SH3  {src  homology  3  domain),  CAT  (kinase  catalytic  diomain),  ZIP+B  (double  leucine  zipper  and 
basic  domain)  and  PRO/SER  (proline  and  serine/threonine  rich  C-terminal  domain). 


as  well  as  part  of  an  extended  C-terminal  domain  (data  not  shown).  There  is  98%  amino  acid 
identity  between  human  and  mouse  MLK’s  1  within  the  catalytic,  zipper  and  basic  domains 
(only  2  out  of  the  6  substitutions  in  a  368  amino  acid  overlap  are  non-conservative).  The 
homology  between  mouse  and  human  MLK’s  1,  however,  ends  just  after  the  basic  domain 
where  mouse  MLKl  has  a  serine  and  proline  rich  C-terminal  domain  similar  to  that  of  human 
MLK’s  2  and  3  (discussed  in  the  C-terminal  domain  section  below). 

Using  a  panel  of  human,  mouse  and  Chinese  hamster  hybrid  cell  lines,  the  MLKl 
gene  was  mapped  to  human  chromosome  14  (Dorow  et  al.,  1993).  In  situ  hybridisation  of 
human  chromosomes  (Choo  et  aL,  1990)  was  then  used  to  further  localise  the  MLKl  gene 
to  14q24.3-3 1 .  This  area  of  chromosome  14  has  been  shown  to  be  involved  in  translocation 
in  a  large  number  of  human  malignancies  (Testa,  1990).  Other  genes  assigned  to  this  region 
of  human  chromosome  14  include  the  70kD  heat  shock  cognate  protein-2  (14q22-24 
[Harrison,  etai,  1987]),  transforming  growth  factor  beta-3  (14q24  [Barton  et  al,  1988])  and 
the  c-fos  oncogene  (14q24.3  [Ekstrand  and  Zech,  1987]). 

In  the  last  few  months,  three  different  laboratories  have  reported  the  sequence  of  a 
third  member  of  the  MLK  family,  called  variously  PTKl  (Ezoe  et  al,  1994),  SPRK  (Gallo 
et  al,  1994)  and  MLK3  (Ing  et  aL,  1994).  These  three  identical  protein  sequences  were 
predicted  from  nucleotide  sequences  of  melanocyte  (Ezoe,  et  al,  1994),  haematopoietic  cell 
(Gallo  et  al,  1994)  and  thymus  (Ing  et  al,  1994)  cDNA  clones.  Results  presented  in  each  of 
these  reports  indicate  that  this  protein,  referred  to  here  as  MLK3,  is  much  more  widely 
expressed  than  either  MLK’s  1  or  2,  being  found  in  a  very  large  range  of  human  tissues  and 
cell  lines.  Gallo  et  al  (1994)  used  recombinant  MLK3  expressed  in  mammalian  cells  in  in 
vitro  kinase  assays  to  show  that  the  expressed  protein  auto-phosphorylated  on  serine  and 
threonine  residues.  Ing  et  al.  (1994)  have  mapped  the  MLK3  gene  to  human  chromosome 
llql3.1-13.3.As  discussed  in  that  report,  amplifications  of  this  area  of  chromosome  1 1  have 
been  observed  with  varying  frequencies  in  malignancies  of  the  breast,  lung,  oesophagus, 
bladder,  head  and  neck  and  in  melanomas.  Furthermore,  Ezoe  et  al.  (1994)  showed  that 
anti-sense  oligonucleotides  to  the  region  of  the  MLK3  initiator  methionine  were  able  to 
inhibit  the  growth  of  human  melanocytes  in  culture. 


MLK  STRUCTURAL  DOMAINS 
The  SH3  Domain 

SH2  and  SH3  domains  are  conserved  sequences  present  in  a  number  of  proteins 
involved  in  intracellular  signalling,  including  members  of  the  src  related  family  of  protein 
kinases  (Mayer  et  al,  1988;  Koch  et  al.,  1991).  SH3  domains  are  also  found  in  cytoskeletal 
proteins  (Rodaway  et  al,  1989).  SH2  domains  are  comprised  of  about  100  amino  acids  and 
SH3  about  60,  each  with  several  highly  conserved  motifs  that  form  consensus  sequences 
(Koch  et  al,  1991).  Both  domains  are  involved  in  protein-protein  interactions  in  the  passage 
of  intracellular  signals  from  activated  growth  factor  receptors  (Anderson  etal,  1 990;  Cantley 
et  ah,  1991).  SH2  domains  bind  to  specific  phosphorylated  tyrosines  in  the  cytoplasmic  tails 
of  the  activated  receptors  (Mayer  et  al.,  1991;  Cantley  et  al.,  1991)  while  SH3  domains  bind 
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to  proline  rich  sequences  in  their  target  molecules  (Ren  et  al,  1993;  Yu  et  al,  1994).  Many 
of  the  proteins  to  which  SH3  domains  bind  are  involved  in  the  control  of  GTPase  activity 
and  thus  participate  in  passage  of  signals  through  G-protein  pathways  (Booker  et  al,  1993; 
Egan  et  al,  1993).  Several  adaptor  proteins  have  been  described  which  consist  of  only  SH2 
and  SH3  domains.  These  proteins  link  receptor  kinase  signals  to  the  G-protein  pathway.  One 
such  adaptor  protein,  GRB2,  is  comprised  of  an  SH2  domain  flanked  by  N-  and  C-terminal 
SH3  domains  (Lowenstein  et  al,  1992).  GRB2  binds  activated  EGF  and  PDGF  receptors 
and  connects  them  to  the  ras  signalling  pathway.  Of  the  protein  kinases  that  contain  SH3 
domains,  the  MLK’s  are  the  only  ones  so  far  described  which  do  not  also  contain  a  SH2 
domain. 

Structural  studies  of  SH3  domains  of  a  number  of  proteins  have  shed  light  on  the 
mechanism  by  which  these  domains  bind  their  target  sequences.  Crystal  structures  for  SH3 
domains  from  a-spectrin  (Musacchio  etal,  1992),  c-Fyn  (Noble  etal,  1993),  and  Lck  (Eck 
et  al,  1994)  and  solution  structures  for  those  of  c-src  (Yu  et  al,  1992),  phosphatidylinosi- 
tol-3’  (PI-3’)  kinase  (Booker  et  al,  1993;  Koyama  et  al,  1993)  and  phospholipase  C-y 
(Khoda  et  al,  1993)  have  been  published.  Sequence  alignments  based  on  structural  data 
revealed  a  series  of  highly  conserved  aromatic  residues  (see  Figure  2)  located  in  p-strands 
joined  by  variable  loops  (Koyama  et  al,  1993).  In  the  folded  structure,  conserved  residues 
line  a  binding  pocket  into  which  the  aromatic  side  chains  protrude  (Booker  et  al,  1993;  Yu 
et  al,  1994).  Part  of  one  variable  loop  forms  an  end  of  the  binding  pocket,  leading  to  the 
speculation  that  residues  within  this  loop  may  contribute  to  the  fine  specificity  of  the  domain. 
The  SH3  domains  of  the  PI-3  ’  kinase  and  the  neuronal  form  of  the  src  oncogene  (N-^rcj, 
each  have  an  insert  in  this  loop  containing  1 5  residues  in  the  case  of  PI-3  ’  kinase  and  6  for 
N-^rc.  The  placement  of  these  inserts  near  the  binding  pocket  suggests  that  they  may  play  a 
role  in  target  recognition  by  these  SH3  domains. 

A  comparison  of  the  SH3  domains  of  MLK’s  2  and  3,  with  a  number  of  related  SH3 
domains,  is  shown  in  Figure  2.  The  degree  of  amino  acid  conservation  between  the 
MLK-SH3  domains  is  very  high  with  only  4  of  20  substitutions  in  61  residues  being 
non-conservative.  There  is  however,  one  7  residue  peptide,  located  in  a  variable  region 
between  the  last  two  consensus  motifs  (boxed  in  Figure  2),  in  which  there  are  4  substitutions 
and  one  deletion  in  MLK3  compared  to  MLK2.  While  all  but  one  of  the  substitutions  is 
conservative,  the  deletion  may  cause  a  more  striking  difference  in  conformation  between  the 
two  molecules  in  this  peptide  than  in  the  rest  of  the  domain. 
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Figure  2.  Alignment  of  the  SH3  domain  amino  acid  sequences  of  MLK’s  2  and  3  with  SH3  domains  of  selected 
signalling  and  cytoskeletal  proteins.  Alignments  were  done  relative  to  the  MLK2  sequence  with  gaps 
introduced  to  maximise  matches.  SH3  domains  are  from  GRB-2C  (C-terminal  SH3  domain  of  the  growth-factor 
receptor  binding  protein  2),  PLC  (phospholipase  C  y),  C-src  (the  cellular  src  oncogene),  N-src  (the  neuronal 
form  of  src),  SPC  (the  cytoskeletal  protein  spectrin)  and  PI-3K  (the  p85  subunit  of  the  phosphatidylinosotol 
3'  kinase). 
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Among  other  SH3  containing  proteins,  the  SH3  domain  of  the  MLK’s  is  most  closely 
related  to  the  C-terminal  SH3  domain  of  the  adaptor  GRB2,  The  MLK-SH3  domains, 
however,  differ  from  those  of  GRB2  in  that  they  contain  a  5  residue  insert,  analogous  to  that 
of  N-5rc.  While  4  of  the  6  ^-src  insert  residues  are  charged,  however,  those  of  the  MLK 
inserts  are  mainly  hydrophobic,  suggesting  substantial  differences  of  specificity  between  the 
SH3  domains  of  these  molecules. 

The  Kinase  Catalytic  Domain 

The  predicted  amino  acid  sequence  of  MLK’s  1-3  catalytic  domains  is  shown  in 
Figure  3  Each  sequence  contains  all  of  the  amino  acid  residues  conserved  in  the  1 1  kinase 
catalytic  subdomains  described  by  Hanks  et  aL  (1988)  and  postulated  to  be  necessary  for 
protein  kinase  activity.  While  in  subdomain  VIb  the  MLK  sequences  contain  a  lysine  residue 
thought  to  be  diagnostic  of  serine/threonine  specificity,  their  overall  catalytic  domain 
structure  is  more  closely  related  to  the  tyrosine,  rather  than  the  serine/threonine  kinases.  In 
several  conserved  motifs,  the  sequence  of  the  MLK’s  are  related  only  to  the  tyrosine  kinases. 
In  particular,  two  tryptophan  residues  in  subdomain  IX  and  a  motif  within  subdomain  XI 
(Cys-Trp-X-X-Asp/Glu-Pro-X-X-Arg-Pro-X-Phe)  are  highly  conserved  in  the  tyrosine  pro¬ 
tein  kinases,  the  raf/mos  oncogene  kinases  and  the  MLK’s  (see  Figure  3),  but  are  not  found 
in  other  members  of  the  serine/threonine  specific  class  (Hanks  et  aL,  1988). 
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Figure  3.  Alignment  of  the  predicted  catalytic  domain  amino  acid  sequences  of  MLK’s  1,  2  and  3.  Residues 
conserved  in  all  three  proteins  are  in  bold  type,  conservative  replacements  in  normal  type  and  non-conservative 
replacements  are  in  shaded  boxes.  Roman  numerals  refer  to  protein  kinase  catalytic  subdomains  as  delineated 
by  Hanks  et  aL  (1988).  The  lysine  residue  conserved  in  members  of  the  serine/threonine  specific  class  is  marked 
with  a  star. 
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Table  1.  Relationship  of  the  MLKl  catalytic  domain 
to  other  human  protein  kinases  -  pairwise  similarity 
scores 


Tyr  PKs 

Ser/Thr  PKs 

c-Ros 

73 

c-Raf 

63 

c-Abl 

71 

PKC 

55 

EGFR 

69 

DSRNA 

54 

c-Src 

68 

c-Mos 

52 

TRK 

68 

cGMPDK 

50 

PDGFR 

63 

InsR 

61 

Pairwise  similarity  scores  are  derived  from  the  sequence 
alignment  program  CLUSTAL  (Higgins  and  Sharp,  1988) 
and  represent  the  numbers  of  absolute  identities  between 
two  sequences  minus  a  penalty  for  gaps  introduced  to 
maximise  the  alignment  of  the  sequences  Scores  are  from 
comparisons  between  MLKl  and  human  c-Ros,  c-Abl, 
c-Src,  c-Raf  c-Mos  (proto-oncogenes),  TRK  (colon 
carcinoma  oncogene  product),  EGFR  (epidermal  growth 
factor  receptor),  PDGFR  (platelet  derived  growth  factor 
receptor),  InsR  (insulin  receptor),  PKC  (protein  kinase  C), 
DSRNA  (double  stranded  RNA  activated  protein  kinase) 
and  cGMPDK  (cyclic  GMP  dependent  kinase).  Table 
reprinted  from  Dorow  et  al.,  1993 


Comparison  of  the  catalytic  domain  amino  acid  sequences  of  the  MLK’s  (Figure  4) 
reveals  that  the  three  family  members  share  about  75%  identity  (85%  if  conservative 
substitutions  are  considered)  with  no  gaps  needed  to  completely  align  the  sequences.  The 
spatial  arrangement  of  catalytic  domain  motifs,  therefore,  provides  support  for  the  notion 
that  MLKl  represents  a  new  and  distinct  family  of  protein  kinases.  As  the  MLK3  protein 
has  been  shown  to  auto-phosphorylate  on  serine  and  threonine  (Gallo  et  aL,  1994),  the  high 
degree  of  amino  acid  conservation  within  the  family  suggests  that  MLK’s  1  and  2  are  likely 
to  also  have  serine/threonine  specificity.  As  discussed  by  Gallo  et  al.  (1994),  MLK3  is  the 
first  SH3  domain  containing  kinase  to  have  Ser/Thr  specificity.  Along  with  the  close 
structural  similarity  of  the  MLK  catalytic  domain  to  that  of  the  tyrosine  kinases,  it  is  unlike 
that  of  any  previously  described  family  of  serine/threonine  specific  kinases. 

The  MLKl  protein  kinase  catalytic  domain  sequence  was  aligned  with  a  series  of 
other  human  protein  kinase  domain  sequences  using  the  alignment  program,  CLUSTAL 
(Higgins  and  Sharp,  1988).  The  similarities  between  MLKl  and  each  of  the  other  protein 
kinase  domain  sequences  are  presented  as  pairwise  similarity  scores  in  Table  1 .  These  data 
show  that  the  strongest  similarities  of  the  MLKl  protein  kinase  catalytic  domain  are  to  the 
tyrosine  kinases  c-Ros,  c-Abl,  EGFR  and  c-Src.  Among  the  serine/threonine  protein  kinases, 
MLKl  is  most  closely  related  to  c-Raf  Most  members  of  the  protein  kinase  families  to  which 
the  MLK’s  show  strongest  catalytic  domain  homology  are  oncogenes  with  transforming 
ability,  or  growth  factor  receptors. 

Alignment  of  the  MLKl  catalytic  domain  with  that  of  human  Ros,  EGFR  and  Raf  is 
shown  in  Figure  4.  At  approximately  35%  of  positions  in  the  MLKl  catalytic  domain,  amino 
acid  residues  are  identical  to  Ros,  32%  to  EGFR  and  27%  to  Raf  When  conservative  amino 
acid  substitutions  are  considered,  this  becomes  47%,  45%  and  41%  similarity  to  Ros,  EGFR 
and  Raf,  respectively.  It  can  be  seen  by  the  number  of  gaps  needed  to  align  the  sequences, 
however,  that  the  spacing  between  the  conserved  areas  varies  extensively  from  one  sequence 
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Figure  4.  Alignment  of  the  catalytic  domain  amino  acid  sequence  of  MLKl  with  those  of  human  Ros,  EGFR 
and  Raf.  Alignments  were  done  by  the  sequence  alignment  program  CLUSTAL  (Higgins  and  Sharp,  1988). 
Conserved  amino  acids  are  in  shaded  boxes  and  conservative  replacements  are  in  italics.  Gaps,  introduced  to 
maximise  the  alignment,  are  denoted  by  dashes.  Figure  reprinted  from  Dorow  et  al.,  1993. 


to  another.  The  alignment  of  amino  acids  within  protein  kinase  catalytic  domains  is,  with 
some  exceptions,  conserved  among  members  of  each  kinase  family  (Hanks  et  al.,  1988). The 
catalytic  domain  sequence  arrangement,  therefore,  supports  the  notion  that  the  MLK’s  form 
a  new  and  distinct  family  of  protein  kinases. 

The  Double  Leucine  Zipper  and  Basic  Domain 

A  further  notable  feature  of  the  MLK  protein  sequences  is  the  presence  of  two  closely 
spaced  leucine/isoleucine  zipper  motifs  followed  by  a  basic  domain  C-terminal  to  the 
catalytic  domain.  This  double  leucine  zipper  region  has  a  novel  structure  not  previously 
reported  for  any  protein.  In  the  classic  leucine  zipper,  first  described  by  Landschultz  et  al. 
(1988)  leucine  residues  are  repeated  with  a  heptad  periodicity  over  a  stretch  of  at  least  22 
amino  acids.  This  sequence,  which  has  a  higher  than  average  representation  of  charged 
residues,  is  postulated  to  form  a  helix  with  a  hydrophobic  ridge  of  leucines  down  one  face. 
This  motif,  which  is  most  commonly  found  in  transcription  factors,  promotes  dimerization 
(O’Shea  et  al.,  1991)  through  hydrophobic  interactions  between  heptad  leucines.  Such 
dimerization  appears  to  activate  DNA  binding  by  orienting  the  basic  side  chains  of  DNA 
binding  residues  for  correct  contact  with  the  DNA.  The  strongly  helix  breaking  amino  acids, 
proline  and  glycine,  are  under-represented  in  zipper  motifs  (McKnight,  1991).  The 
leucine/isoleucine  zipper  is  a  variation  in  that  isoleucine  replaces  leucine  at  some  positions 
in  the  heptad  repeat  (Atkinson  et  al.,  1991). 

An  alignment  of  the  leucine  zipper  domains  of  the  MLK’s  is  shown  in  Figure  5.  Each 
zipper  motif  contains  22  amino  acids  and  the  two  motifs  are  separated  by  a  13  residue 
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Figure  5.  Alignment  of  the  leucine  zipper  motifs,  spacer  and  basic  domains  of  MLKs  1,  2  and  3.  Heptad 
leucine/isoleucine  residues  and  basic  domain  residues  are  in  shaded  boxes.  A  possible  nuclear  localisation 
signal  is  double  shaded. 


“spacer”  segment.  The  entire  structural  domain  covers  a  region  of  57  amino  acids  from  which 
proline  and  glycine  are  totally  absent  in  MLK’s  1  and  2  while  in  MLK3  there  is  but  one 
glycine  located  in  zipper  motif  #1.  MLK3  also  has  a  valine  substitution  at  the  third  heptad 
position  of  zipper  #2,  while  all  other  heptad  positions  in  both  zipper  motifs  of  the  three 
proteins  are  occupied  by  either  leucine  or  isoleucine.  In  each  of  the  three  proteins,  charged 
amino  acids  occupy  about  50%  of  positions  within  the  double  zipper  domain.  Among  the 
charged  amino  acids,  acidic  residues  are  the  most  prevalent,  yielding  a  net  negative  charge 
within  the  double  zipper  domain. 

The  degree  of  amino  acid  conservation  between  the  MLK  proteins  is  more  varied 
within  the  double  zipper  domain  than  the  catalytic  domain  (see  Figure  5)  While  the  zipper 
domain  sequences  of  MLK’s  1  and  2  share  74%  identity,  similar  to  that  of  their  catalytic 
domains,  residues  in  this  domain  are  less  highly  conserved  in  MLK3  with  58%  identity  to 
MLKI  and  68%  to  MLK2.  Within  zipper  motif  #2,  MLK’s  1  and  2  are  particularly  similar 
with  1 9  of  22  positions  having  identical  amino  acids,  and  two  of  the  three  substitutions  being 
conservative.  There  is  also  a  very  high  degree  of  conservation  of  charged  residues  in  the 
double  zipper  domain.  Of  27  charged  amino  acids  within  the  double  zipper  domain  of  MLKI 
there  are  but  two  replacements  in  MLK2  and  one  of  these  is  conservative.  This  is  slightly 
reduced  for  MLK3  with  5  non-conservative  substitutions  of  charged  residues  compared  to 
MLK’s  1  and  2.  This  high  degree  of  amino  acid  conservation  attests  to  the  probable 
importance  of  the  charged  residues  to  the  activity  of  the  MLK  proteins. 

When  arrayed  on  helical  wheel  templates  (Schiffer  and  Edmundson,  1 967),  the  two 
zipper  motifs  of  MLKI  (Figure  6)  differ  slightly  from  one  another  in  the  positioning  of 
charged  amino  acids.  In  motif  #1,  one  side  of  the  helix,  at  wheel  positions  2,  5  and  6,  has  a 
preponderance  of  charged  residues,  creating  a  highly  charged  surface  on  that  entire  side  of 
the  putative  helix.  Position  5  is  acidic,  2  is  basic  and  6  contains  both  basic  and  acidic  amino 
acids.  In  motif  #2,  the  leucine/isoleucine  stripe  is  flanked  on  one  side  by  three  arginine 
residues  and  on  the  other  by  three  glutamic  acid  residues.  This  creates  basic,  hydrophobic 
and  acidic  stripes  along  the  length  of  the  putative  helix. 

The  positioning  of  charged  residues  within  the  MLK  zipper  motifs  is  similar  to  the 
zipper  region  of  the  cGMPDK  (Landgraf  et  aL,  1990;  Atkinson  et  al.,  1991).  This  is  more 
striking  for  MLK  zipper  motif  #2  in  which  the  acidic  (Gln-Glu-Glu-Glu)  and  basic  (His- 
Arg-Arg-Arg)  stripes  flank  the  ridge  of  heptad  leucine/isoleucine  residues  in  the  helical 
configuration  (see  Figure  6).  In  the  cGMPDK  zipper  sequences,  there  are  similarly  placed 
acidic  (Ala-Glu-Glu-Glu)  and  basic/hydrophobic  (Lys-Leu-Lys-Leu)  stripes  (Atkinson  et 
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Figure  6.  Helical  wheel  representation  of  the  leucine  zipper  motifs  of  MLK  1.  The  spokes  of  the  wheel  show 
schematically  the  relative  positions  of  the  amino  acids  of  the  zipper  motifs  in  an  idealised  alpha-helix.  The 
seven  positions  correspond  to  a  typical  alpha-helix  with  seven  residues  to  every  2  helical  turns.  The  most 
N- terminal  hep  tad  position  is  the  residue  closest  to  the  center  at  the  top  spoke.  Positions  proceed  around  the 
wheel  to  the  right  skipping  every  second  spoke.  Residues  in  each  succeeding  2  turns  of  the  helix  are  in  the 
second,  third  and  fourth  positions  out  from  the  center  at  each  spoke.  Figure  reprinted  from  Dorow  et  al.,  1993. 


ai,  1991).  The  cGMPDK  leucine/isoleucine  zipper  sequence  forms  a  helix  in  solution  which 
is  more  stable  to  heat  or  acid  conditions  than  that  of  the  transcription  factor-type  leucine 
zipper,  as  measured  by  circular  dichroism  (Atkinson  et  al,  1991).  Furthermore,  the  stripes 
of  charged  residues  within  the  cGMPDK  zipper  are  thought  to  contribute  to  electrostatic 
stabilisation  of  the  helical  conformation  (Landgraf  et  al.,  1990), 

Another  motif  which  has  recently  been  described  in  DNA  binding  proteins  is  the 
helix-loop-helix  (HLH)  (Murre  et  al,  1989).  This  motif  is  characterised  by  two  putative 
amphipathic  alpha-helices,  separated  by  a  short  loop  region,  in  a  span  of  approximately  60 
amino  acids.  It  has  been  identified  in  a  number  of  proteins  which  are  involved  in  gene 
regulation.  This  HLH  domain  has  some  similarity  to  the  MLK’s  with  their  two  closely  spaced 
but  distinct  zipper  motifs.  Based  on  the  similarity  of  the  MLK  zipper  domains  to  the  cGMPDK 
leucine/isoleucine  zipper,  with  its  well  characterised  helical  conformation  (Landgraf  et  al, 
1 990;  Atkinson  et  <3/1991 ),  the  MLK  zippers  would  be  expected  to  form  stable  helices.  The  1 3 
residue  spacer  sequence  between  the  two  helical  segments  suggests  a  possible  break  in  the 
helical  nature  of  this  domain.  We  therefore  used  a  structural  prediction  algorithm  (Chou  and 
Fasman,  1978)  to  define  possible  secondary  structures  for  the  MLKl  zipper  domain  sequence. 
In  this  analysis  the  two  zipper  motifs  were  indeed  predicted  to  be  helical  in  nature,  with  a  turn 
conformation  predicted  in  the  spacer  region  (Dorow  <3/.,  1993).  If  this  arrangement  does  occur 
in  the  folded  protein,  the  two  zipper  helices  within  the  one  MLK  protein  could  interact  with 
each  other  in  an  anti-parallel  “zipper-tum-zipper”  configuration  analogous  to  the  HLH  helices. 

A  further  region  of  interest  in  the  MLK  structures  is  an  extremely  basic  sequence  that 
begins  9  residues  C-terminal  to  the  final  heptad  isoleucine  of  the  double  zipper  domain  (see 
Figure  5).  In  this  15  amino  acid  stretch  there  are  10  basic  (Lys  or  Arg),  4  hydrophobic  and 
no  acidic  amino  acids.  Like  the  double  zipper  domain,  the  basic  region  shows  a  high  degree 
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of  charge  conservation  between  the  sequences  of  the  MLK’s.  While  there  are  several 
replacements  within  the  1 5  amino  acids,  there  is  but  one  substitution  (Lys  to  Asn  or  Thr) 
which  changes  the  basic  nature  of  the  amino  acid  side  chain.  While  this  basic  sequence  has 
some  similarity  to  the  DNA  binding  domains  of  the  transcription  factors  (Vinson  et  aL^  1 989), 
it  also  has  several  significant  differences  including  it’s  location  on  the  C-terminal,  rather 
than  the  N-terminal  side  of  the  leucine  zipper.  Furthermore,  the  MLK  basic  region  contains 
the  sequence  Val-Lys/Arg-Lys-Arg-Lys-Gly  which  is  very  similar  to  the  nuclear  localisation 
signal  of  the  SV40  large  T  antigen  (Pro-Lys-Lys-Lys-Arg-Lys-Val  [Kalderon  et  al.,  1984]). 

The  C-terminal  Region 

Both  MLK’s  2  and  3  have  C-terminal  domains  which  are  rich  in  proline  and  serine/threon¬ 
ine  (data  not  shown).  This  C-terminal  domain  is  not  coded  in  the  MLKl  cDNAclone  from  human 
colon,  however,  it  is  present  in  mouse  brain  MLKl  and  in  human  MLK2.  Because  of  if  s  presence 
in  the  mouse  MLK  1  sequence,  this  domain  would  be  expected  to  exist  in  the  human  MLK  1  protein 
as  well.  It  is  possible  that  truncated  or  rearranged  human  colonic  MLKl  cDNAs  represent  RNAs 
that  have  been  spliced  or  edited  in  human  colonic  cells.  It  cannot  be  ruled  out,  however,  that  they 
may  be  an  artefact  that  occurred  in  preparation  of  the  colonic  cDNA  library. 

While  there  is  considerable  similarity  within  the  C-terminal  domain  among  three  of 
the  MLK  proteins  (mouse  brain  MLKl,  human  colon  and  brain  MLK2  and  human  MLK3 
from  several  sources),  it  is  not  as  striking  as  in  the  SH3,  catalytic,  zipper,  and  basic  domains. 
Sequences  within  this  domain,  therefore,  may  be  involved  in  conferring  specificity  on  these 
three  similar  molecules.  The  C-terminal  domain  of  mouse  MLKl  contains  consensus 
sequences  for  phosphorylation  by  a  number  of  known  protein  kinases,  including  casein 
kinase  II,  protein  kinase  C,  and  cyclic  AMP  dependent  protein  kinase.  The  C-terminal  domain 
of  MLK3  also  contains  sequences  similar  to  the  proline  rich  consensus  for  SH3  domain 
binding  (Gallo  et  al,  1994;  Ing  et  ah,  1994).  These  features  could  possibly  allow  the 
C-terminal  domain  to  regulate  protein  interactions  involving  the  MLK-SH3  domain. 


SUMMARY 

In  this  report,  we  have  described  the  structure  of  three  members  of  the  MLK’s,  a  new 
family  of  human  protein  kinases.  These  are  the  first  kinases  described  with  an  SH3  domain 
in  the  absence  of  an  SH2  domain  and  MLK3  is  the  first  SH3  domain  containing  kinase  to 
be  shown  to  have  serine/threonine  specificity.  In  addition,  the  MLK’s  have  a  unique  double 
leucine  zipper/basic  domain  that  has  not  been  found  in  any  other  protein  to  date. 

In  the  current  model  for  many  signal  transduction  pathways,  activated  receptors 
recruit  protein  kinases  and  adaptor  molecules  that  contain  SH2  and  SH3  domains.  This  is 
accompanied  by  a  general  rise  in  protein  kinase  activity  and  interaction  among  proteins 
residing  near  the  plasma  membrane  and/or  bound  to  cytoskeletal  elements.  Many  of  these 
proteins  contain  SH3  domains.  Among  other  effects,  this  leads  to  G  protein  activation  and 
guanine-nucleotide  exchange,  generating  a  signal  which  translocates  into  the  nucleus  to 
trigger  complex  formation  of  transcription  factors  through  their  leucine  zippers.  This 
activates  DNA  binding  and  gene  transcription.  Phosphorylation  plays  a  role  at  each  step  of 
this  multi-faceted  process.  As  well  as  a  novel  type  of  kinase  catalytic  domain  structure,  the 
MLK’s  contain  both  a  SH3  domain,  leucine  zippers  and  a  possible  nuclear  localisation  signal. 
This  makes  them  unusual  among  the  protein  kinases  in  that  they  may  act  in  several 
compartments  of  the  pathway  through  their  distinct  domains.  Definition  of  the  role  of  this 
new  family  of  biological  control  molecules  in  signal  transduction  pathways  may  yield  new 
insights  into  the  regulation  of  cellular  events. 
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INTRODUCTION 

In  the  vertebrate  brain,  GABA^  receptors  on  postsynaptic  membranes  are  the  major 
transducers  of  fast  inhibitory  neurotransmission.  These  receptors  are  hetero-pentameric 
proteins  that  provide  specific  binding  sites  for  GABA,  benzodiazepines,  barbiturates,  and 
anesthetic  steroids  as  well  as  an  integral  chloride  channel  (Macdonald  and  Olsen,  1994). 
Chloride  channel  openings  are  gated  by  GABA  and  allosterically  potentiated  by  benzo¬ 
diazepines  and  other  anxiolytic  and  hypnotic  drugs.  The  exceptionally  rich  pharmacology 
associated  with  GABA^  receptors  has  evoked  considerable  interest  in  their  structure  and 
function.  Chronic  administration  of  many  GABA^ergic  compounds  in  humans  and  animals 
produces  syndromes  of  dependence  and  tolerance  which  limit  their  clinical  value.  Since  the 
development  of  tolerance  to  GABA^ergic  drugs  is  attributed  to  functional  rather  than 
pharmacokinetic  accomodation  (Greenblatt  and  Shader,  1986),  attention  has  focused  on 
use-dependent  modifications  of  GABAA.receptors.  Indeed,  there  is  general  agreement  that 
chronic  exposure  of  rodents  to  benzodiazepines  produces  a  decline  in  GABA^  receptor 
function  which  coincides  with  the  onset  of  tolerance  (Miller  et  al.,  1988;  Marley  and 
Gallager,  1989;  Lewin  et  al.,  1989).  However,  the  molecular  events  which  underlie  this  loss 
of  receptor  function  are  not  well  defined. 

Our  understanding  of  the  use-dependent  regulation  of  GABA^  receptors  has  been 
greatly  aided  by  studies  of  cortical  neurons  in  tissue  culture.  GABA^ergic  ligands  bind 
readily  to  receptor  domains  on  the  surface  of  these  cells,  permitting  biophysical  and 
biochemical  characterization  of  receptor  function.  After  the  binding  of  GABA^  receptor 
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agonists,  the  most  rapid  regulatory  process  is  desensitization.  Desensitization  of  these 
receptors  occurs  within  seconds  and  is  rapidly  reversible  following  agonist  removal.  Because 
desensitization  can  be  demonstrated  in  isolated  membrane  patches  at  room  temperature 
(Hamill  et  ah,  1983;  Weiss,  1988),  the  mechanism  probably  involves  a  reduction  in  intrinsic 
GABA^  receptor  channel  activity  rather  than  receptor  removal  from  the  surface.  After 
exposure  of  cortical  neurons  to  GABA  or  the  benzodiazepine  clonazepam  for  one  hr  at  37°C, 
an  increase  in  the  intracellular  fraction  of  ^H-flunitrazepam  binding  sites  has  been  reported 
(Tehrani  and  Barnes,  1991).  That  the  sequestration  of  GABA^  receptors  occurs  in  vivo  is 
suggested  by  ligand  binding  to  clathrin-coated  vesicles  (Tehrani  and  Barnes,  1 993).  Follow¬ 
ing  administration  of  lorazepam  to  mice,  the  level  of  GAB  A/^  receptors  on  clathrin-coated 
vesicles  increases  while  that  on  synaptic  membranes  decreases  (Tehrani  and  Barnes,  1994). 
Although  it  appears  likely  the  sequestered  GABA^  receptors  are  derived  from  the  neuronal 
surface,  this  has  not  yet  been  demonstrated. 

It  is  well  known  that  chronic  (several  days)  exposure  of  cortical  neurons  to  GABA 
reduces  the  density  of  GABA^ receptor  ligand  binding  sites  (Maloteaux  et  aL,  1 987;  Tehrani 
and  Barnes,  1988;  Roca  et  ah,  1990).  This  process,  referred  to  as  down-regulation,  is 
accompanied  by  persistent  losses  of  spontaneous  inhibitory  postsynaptic  currents  as  well  as 
chloride  currents  evoked  by  applied  GABA  (Hablitz  et  al.,  1989).  An  agonist-dependent 
reduction  in  receptor  number  could  be  explained  by  a  decrease  in  receptor  biosynthesis  or 
by  an  increase  in  degradation.  Since  GABA^  receptor  subunit  mRNAs  are  also  subject  to 
down-regulation  by  GABA  (Montpied  et  al.,  1991;  Baumgarter  et  al.,  1994;  Mhatre  and 
Ticku,  1994),  changes  in  the  synthesis  or  stability  of  receptor  transcripts  appear  to  be  a  part 
of  the  control  mechanism.  However,  after  administration  of  GABA,  the  reduction  in  ligand 
binding  sites  precedes  that  of  the  subunit  mRNAs  (Baumgartner  et  al.,  1994),  leading  to  the 
hypothesis  that  translational  or  post-translational  regulation  may  be  important  in  the  initial 
phase  of  down-regulation. 

In  order  to  investigate  the  down-regulation  of  GABAy^^  receptor  polypeptides  from 
the  neuronal  surface,  we  have  utilized  the  impermeant  cleavable  labeling  reagent  '^^I-DPSgt 
(3,3’-dithiopropionyl  1-sulfosuccinimidyl  T-glycyltyrosine)  (Bretscher  and  Lutter,  1988) 
in  combination  with  quantitative  immunoprecipitation  (Calkin  and  Barnes,  1994).  We  report 
here  the  application  of  this  technique  to  examine  the  agonist-induced  sequestration  and 
subsequent  degradation  of  GAB A^  receptor  polypeptides. 


METHODS  AND  RESULTS 

Effects  of  Chronic  Exposure  to  GABA^  Receptor  Agonists 

Neuronal  cell  cultures  from  the  embryonic  chick  cerebral  cortex  were  prepared  as 
described  by  Tehrani  and  Barnes  (1991).  Living  neurons  were  chronically  treated  with 
GABA  or  other  agonists  by  addition  of  a  single  dose  to  the  culture  medium  and  returning 
the  cells  to  the  incubator  for  5  days.  Neurons  from  the  same  preparation  but  without  agonist 
addition  were  used  as  controls.  The  cell  monolayers  on  Petri  dishes  were  washed  and 
incubated  with  [‘^^IjDPSgt  at  4°C  (Calkin  and  Barnes,  1994).  Analysis  of  the  major 
*^^I-pep tides  by  SDS-PAGE  showed  that  the  GABA  treatment  produced  no  detectible 
difference  in  the  labeling  pattern.  Furthermore,  the  total  incorporation  of  '^^I  into  cellular 
protein  was  not  changed  by  exposure  of  the  neurons  to  GABA  or  to  the  GABAys,  receptor 
agonists,  isoguvacine  andTHIP(4,5,6,7,-tetrahydroisoazolo[5,4-c]pyridin-3-ol).  Consistent 
with  the  structure  and  known  properties  of  ['^^IJDPSgt  (Bretcher  and  Lutter,  1988),  washing 
the  intact  cells  with  GSH  (glutathione)  buffer  removed  essentially  all  of  the  detectible  '^^1. 
This  shows  that  the  ’^^I  label  is  initially  confined  to  polypeptides  on  the  neuronal  cell  surface. 
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Figure  1.  Labeling  of  surface  GABAa  receptor  polypeptides  with  '^^I-DPSgt. 

Cultured  cortical  neurons  were  washed  and  labeled  with  ^^^I-DPSgt  for  30  min 

at  4°C.  The  medium  was  removed  and  the  cells  were  extracted  with  Tris-buff- 

ered  saline  containing  1%  Triton  X-100,  0.1%  SDS,  and  protease  inhibitors  53  ^ 

(Calkin  and  Barnes,  1 994).  In  lane  3,  the  cells  were  washed  at  4°C  with  buffer  50  ^ 

containing  100  mM  GSH  before  extraction.  The  clarified  extracts  were  incubated 

with  3  pi  of  preimmune  serum  {lane  1)  or  antiserum  RB4  {lanes  2  and  i)  and  then 

mixed  and  incubated  further  with  40  pi  Staphlococcus  A  cells  (10%  w/v).  The 

immunoprecipitates  were  run  on  10%  polyacrylamide-SDS  gels  which  were  then 

dried  and  autoradiographed.  *1  2 


GABAa  receptor  polypeptides  with  *^^I-labeled  surface  domains  were  isolated  by 
Triton  X-100  extraction  and  immunoprecipitation  with  polyclonal  antiserum  RB4,  an 
antibody  directed  against  the  native  receptor.  Antiserum  RB4  quantitatively  precipitates 
GABAa  receptor  binding  sites  for  ^H-muscimol  and  ^H-flunitrazepam  from  Triton  extracts 
of  cultured  neurons  and  cross-reacts  with  50-54-kDa  subunits  of  the  affinity-purified 
receptor  (Calkin  and  Barnes,  1994).  The  RB4  immunoprecipitates  from  extracts  of 
DPSgt-labeled  neurons  contained  labeled  50-  and  53-kDa  polypeptides  which  were  not 
found  in  preimmune  controls  (Fig.  1).  The  mass  of  these  polypeptides  is  similar  to  the  major 
RB4  cross-reactive  subunits  from  the  GABAa  receptor  antigen.  When  the  labeled  cells  were 
washed  with  GSH  buffer  prior  to  extraction,  essentially  all  of  the  radioactivity  was  removed 
from  these  proteins  (Fig.  1).  Thus,  the  50-  and  53-kDa  ‘^^I-polypeptides  arise  from  GABAa 
receptor  subunits  which  contain  domains  exposed  at  the  outer  surface  of  the  cells. 

After  chronic  (5  day)  treatment  of  a  set  of  cultures  with  agonists  (100  mM  final 
concentration  in  the  growth  medium),  washed  intact  neurons  were  labeled  with  ^^^I-DPSgt 
as  before.  This  exposure  to  GABA  caused  a  decline  in  the  surface  50-  and  53-kDa  GABAa 
receptor  '^^I-subunits  compared  to  the  untreated  controls  (Fig.  2).  Since  the  specific  GABAa 
receptor  antagonist,  R5135  (3a-hydroxy-16-imino-5p-17-aza-androstan-ll-one)  prevented 
this  decline,  the  GABAa  receptor  appears  to  have  a  role  in  signaling  its  own  down-regulation. 
Similar  chronic  treatments  were  carried  out  with  the  specific  GABAa  receptor  agonists, 
isoguvacine  and  THIP.  The  combined  autoradiographic  density  of  the  50-  and  53-kDa 
*^^I-polypeptides  was  determined  and  used  to  quantify  the  extent  of  GABAa  receptor 
down-regulation  produced  by  these  agents  (Table  1).  Both  GABA  and  isoguvacine  caused 
a  substantial  decrease  in  the  amount  of  labeled  proteins,  while  THIP  was  much  less  effective. 
This  is  in  accord  with  the  effects  of  these  agents  on  GABAa  receptor  channels  in  our 


Control  GABA  R5135  GABA+R5135 


53 -► 
50-^ 


Figure  2.  Levels  of  surface  GABAa  receptor  polypeptides  on  neurons  chronically  exposed  to  GABA  and 
R5 135.  Where  indicated,  GABA  (100  pM)  and  R5 135  (1  pM)  were  added  to  the  culture  medium  and  the  cells 
returned  to  the  incubator  for  5  days.  These  additions  were  omitted  for  the  samples  in  the  control  lane.  The  cells 
were  then  washed  and  labeled  with  '^^I-DPSgt  and  the  GABAa  receptor  polypeptides  were  analyzed  using 
RB4  immunoprecipitation  as  in  Fig.  1. 
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Table  1.  Effect  of  chronic  agonist  exposure  on 
GABAa  receptor  surface  polypeptides 


’^^I-Receptor  peptides 

Treatment 

% 

n 

None 

100  ±  10.5 

7 

GABA 

*37.5  ±3.2 

7 

Isoguvacine 

*52.4  ±4.9 

4 

THIP 

84.2 

2 

Experiments  were  carried  out  as  described  for  Fig.  2.  Cells 
were  treated  for  5  days  with  the  compound  indicated  at  a 
100  pM  final  concentration  and  then  labeled  with  '-^I- 
DPSgt.  Extracts  were  immunoprecipitated  and  analyzed  on 
gels.  Regions  of  the  autoradiographs  corresponding  to 
50-53  kDa  (Fig.  2)  were  quantified  as  a  single  band  by 
digital  optical  analysis.  The  data  are  expressed  as  a 
percentage  of  controls  without  agonist  and  represent  the 
mean  ±  S.E.  of  the  number  of  experiments  indicated.  *p  < 

0.01  compared  to  control.  The  difference  between  GABA 
and  isoguvacine  treatments  was  not  statistically 
significant. 

preparations.  Application  of  GABA  and  isoguvacine  induces  robust  chloride  currents  but  the 
responses  to  THIP  are  much  weaker  (Mistry  and  Hablitz,  1990).  We  have  also  examined  the 
down-regulation  of  the  total  cellular  pool  of  GABA^  receptor  polypeptides  by  DPSgt 
iodination  of  membranes  from  100,000  g  pellets  of  neuronal  homogenates.  The  membrane 
level  of  5 0-5 3 -kDa  *^^I-subunits  in  the  GABA-  and  isoguvacine-treated  cells  represented 
36%  and  53%,  respectively,  that  in  the  untreated  controls.  Comparable  results  were  obtained 
when  isolated  membranes  were  iodinated  using  chloramine  T.  This  suggests  that  during 
chronic  agonist  exposure  the  down-regulated  surface  receptor  subunits  are  not  retained  in 
an  intracellular  pool. 

In  order  to  compare  these  results  with  the  down-regulation  of  GABA^  receptor 
binding  sites,  we  first  measured  the  binding  of  ^H-flunitrazepam  and  ^^S-TBPS  (t-butylbi- 
cyclophosphorothionate),  a  ligand  for  the  GABA^  receptor  channel.  After  chronic  treatment 
of  the  neurons  with  GABA  or  isoguvacine,  the  binding  of  both  ligands  to  isolated  membranes 
was  substantially  reduced  (Table  2).  As  before,  THIP  had  little  effect.  To  examine  the 
intracellular  binding  sites  after  chronic  treatments,  we  labeled  intact  neurons  with  ^H-fluni- 


Table  2.  Effect  of  chronic  agonist  treatment  on  GABAa  receptor 
ligand  binding  to  isolated  membranes 


Treatment 

Flu  binding 

^^S-TBPS  binding 

% 

n 

% 

n 

None 

100  ±4.0 

16 

100  ±6.3 

7 

GABA 

*68.8  ±  1.4 

8 

*55.5  ±3.3 

7 

Isoguvacine 

*57.0  ±3.5 

6 

*47.4  ±4.2 

7 

THIP 

95.8  ±5.0 

4 

95.2  ±5.5 

4 

Cells  were  treated  with  agonist  as  in  Table  1 .  The  monolayers  were  washed  and 
crude  membranes  were  isolated  and  assayed  for  radioligand  binding  as 
described  by  Calkin  and  Barnes  (1994).  Results  are  expressed  as  a  percentage 
of  controls  without  agonist  and  represent  the  mean  ±  S.E.  from  the  indicated 
number  of  experiments.  *p  <  0.01  relative  to  untreated  control. 
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Table  3.  Effect  of  chronic  agonist  treatment  on  ^H-flunitrazepam 
binding  to  intact  neurons 


Treatment 

Total  receptor 

Intracellular  receptor 

Intracellular/total 

fmol/mg 

n 

fmol/mg 

n 

None 

67.9  ±  1.2 

15 

5.07  ±  0.65 

15 

0.075 

GABA 

*42.0  ±2.4 

15 

6.57  ±  1.09 

14 

0.156 

Isoguvacine 

*35.1  ±  1.7 

14 

5.27  ±  1.30 

15 

0.150 

Cells  were  treated  with  agonist  as  in  Table  1.  The  monolayers  were  washed  and 
incubated  with  1  nM  ^H-flunitrazepam.  Nonspecific  binding  was  determined  using 
1  pM  benzodiazepine  1012-S  and  intracellular  binding  with  1  pM  SPTC-1012S  as 
described  by  Tehrani  and  Barnes  (1991).  The  results  are  expressed  per  mg  cell 
protein  and  represent  the  mean  ±  S.E.  of  the  indicated  number  of  experiments. 
*p  <  0.01  relative  to  untreated  control. 


trazepam,  a  membrane-permeant  ligand,  and  displaced  the  surface  radioactivity  with  an 
impermeant  benzodiazepine,  SPTC-1012S  (Tehrani  and  Barnes,  1991),  The  internal  receptor 
binding  sites  determined  in  this  manner  represent  7.5%  of  the  cellular  total  (Table  3). 
Consistent  with  the  '^^I-labeling  experiments,  the  levels  of  intracellular  receptors  (measured 
as  fmol  ligand  bound/mg  protein)  did  not  change  significantly  after  chronic  agonist  exposure. 
Nevertheless,  the  treatments  doubled  the  fraction  of  internal/total  receptor  binding  sites.  This 
is  a  consequence  of  the  decline  in  surface  receptors. 

Effects  of  Acute  Application  of  Agonists 

The  DPSgt  labeling  procedure  was  also  employed  to  study  the  fate  of  surface  GABA^i^ 
receptor  subunits  during  acute  exposure  of  the  neuronal  cultures  to  agonists.  Cells  grown  in 
normal  medium  were  ^^^I-labeled  with  DPSgt  at  0°C,  incubated  in  culture  with  200  pM 
GABA  for  2  or  4  hr  at  37°C,  and  then  washed  with  GSH  buffer.  Extracts  of  the  neurons  were 
analyzed  as  before.  Labeled  50-  and  53-kDa  receptor  polypeptides  that  were  protected  from 
GSH  cleavage  were  recovered  in  significant  amounts  from  cells  acutely  exposed  to  GABA 
but  not  from  the  untreated  controls  (Fig.  3).  Densitometric  analysis  of  the  autoradiographs 
revealed  that  16.3  ±  2.4%  (n  =  3)  of  the  surface  polypeptides  were  internalized  (protected) 
during  the  2  hr  GAB  A  treatment  (Fig.  4).  Much  lower  amounts  of  the  labeled  subunits  (<3% 
of  those  remaining  at  the  surface)  were  sequested  by  cells  which  were  incubated  with  GABA 
for  2  hr  at  4°C  (Fig.  4).  From  cells  exposed  to  GABA  plus  R5 1 35  for  2  hr  at  37°C  or  controls 
without  GABA,  internalized  polypeptides  were  also  barely  detectable  (not  shown).  Because 


Figure  3.  Sequestration  of  GABAa  receptor  polypeptides  by  acute  GABA  application. 
Neurons  were  cultured  for  6  days  without  exogenous  GABA,  labeled  with  ’^^I-DPSgt  at 
4°C  as  in  Fig.  1,  and  then  incubated  in  the  presence  {lane  1)  or  absence  {lane  2)  of  200  pM 
GABA  for  2  hr  at  37°C.  The  cells  were  washed  with  GSH  buffer  and  the  GABAa  receptor 
polypeptides  were  analyzed  as  in  Fig.  2. 


53 


1 
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37°  37°  4° 


Figure  4.  Acute  sequestration  and  degradation  of 
surface-derived  GABA^  receptor  polypeptides.  Ex¬ 
periments  were  carried  out  as  described  for  Fig.  3. 
Cells  were  labeled  with  ’-^I-DPSgt,  incubated  with 
200  pM  GABA  under  the  conditions  shown,  and  then 
washed  with  GSH  buffer  where  indicated.  GABA^ 
receptor  polypeptides  were  analyzed  as  in  Table  1 . 
The  results  are  expressed  as  a  percentage  of  controls 
in  which  the  GABA  treatment  and  GSH  wash  were 
omitted. 


receptor  internalization  is  unlikely  to  occur  at  4°C,  it  is  probable  that  the  small  amount  of 
polypeptide  recovered  under  these  conditions  is  due  to  residual  surface  label  which  was  not 
removed  by  the  GSH  wash. 

We  consistently  found  that  the  amount  of  internalized  GAB  receptor  polypeptides 
was  greater  after  a  2  hr  than  after  a  4  hr  exposure  of  the  neurons  to  GABA.  Quantitation  of 
a  typical  film  revealed  that  7.9%  of  the  surface  polypeptides  were  recovered  in  the  intracel¬ 
lular  fraction  after  the  4  hr  treatment  compared  to  16%  after  2  hr  (Fig.  4).  Since  the  surface 
subunits  which  are  subject  to  chronic  down -regulation  are  not  retained  by  the  neurons,  it 
appears  likely  that  the  loss  of  sequestered  polypeptides  found  in  the  4  hr  GABA  treatment 
is  due  to  intracellular  degradation.  A  possible  role  for  lysosomal  proteases  in  this  process 
was  evaluated  by  the  addition  of  50  pM  chloroquine  during  the  acute  GABA  treatment.  Since 
chloroquine  had  no  detectable  effect  on  the  amount  of  internalized  receptor  polypeptides, 
lysosomes  appear  not  to  be  involved  in  the  degradation. 


CONCLUSIONS 

We  have  shown  previously  that  GABA^  receptor  ligand  binding  sites  and  gated 
chloride  channels  are  down-regulated  during  chronic  exposure  of  cortical  neurons  to  GABA 
(Tehran!  and  Barnes,  1988;  Hablitz  et  al.,  1989).  This  has  been  independently  confirmed  in 
a  number  of  other  laboratories.  Using  the  DPSgt  labeling  procedure  (Calkin  and  Barnes, 
1994),  we  have  demonstrated  that  these  reductions  in  GABA^  receptor  function  can  be 
accounted  for  by  a  corresponding  loss  of  the  receptor  subunit  polypeptides  from  the  neuronal 
surface.  The  down-regulation  of  these  subunits  was  also  induced  by  isoguvacine,  a  GABA^ 
receptor-specific  agonist,  and  could  be  completely  prevented  by  the  specific  antagonist 
R5 1 35.  This  rules  out  involvement  of  other  known  GABA  binding  proteins,  such  as  GABAb 
receptors  or  GABA  transporters,  and  indicates  that  the  GAB A^  receptor  provides  the  agonist 
site  for  its  own  down-regulation. 

The  DPSgt  labeling  procedure  also  permitted  us  to  examine  the  fate  of  GABA^ 
receptor  polypeptides  during  their  down-regulation  from  the  cell  surface.  By  stripping  the 
intact  cells  with  GSH,  the  label  associated  with  exterior  domains  of  these  proteins  was 
removed,  revealing  a  fraction  of  the  subunits  (approximately  16%)  which  had  become 
sequestered  as  a  consequence  of  acute  GABA  application  at  37°C.  No  detectable  sequestra- 
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tion  occurred  in  the  absence  of  GABA  or  in  the  presence  of  GABA  plus  R5 1 35.  It  was  shown 
previously  that  acute  exposure  of  these  neuronal  preparations  to  GABA  or  clonazepam 
increases  the  fraction  of  intemal/total  receptor  binding  sites  (Tehrani  and  Barnes,  1991).  The 
current  studies  indicate  that  this  increase  is  probable  due  to  GABA^  receptors  derived  acutely 
from  the  surface.  Although  the  vehicle  for  this  sequestration  in  vitro  is  not  known,  clathrin- 
coated  vesicles  are  strongly  implicated  by  GABAa  receptor  binding  studies  in  rodent  brain 
(Tehrani  and  Barnes,  1993).  Since  administration  of  lorazepam  to  mice  increases  receptor 
binding  in  clathrin-coated  vesicles,  while  reducing  that  in  synaptic  membranes  (Tehrani  and 
Barnes,  1994),  it  appears  that  agonist-dependent  sequestration  of  GABAa  receptors  also 
occurs  in  vivo. 

As  the  acute  GABA  treatment  in  culture  progressed  from  2  to  4  hr,  the  amount  of 
internalized  receptor  polypeptides  declined,  suggesting  that  they  were  degraded  intracellu- 
larly.  This  is  consistent  with  the  results  obtained  from  the  chronic  agonist  treatment  which 
show  that  GABAa  receptors  do  not  accumulate  within  the  cells  after  the  initial  down-regu¬ 
lation.  Since  the  entire  cellular  pool  of  receptor  polypeptides  and  ligand  binding  sites 
decreases  by  50-60%  during  the  chronic  treatment,  degradation  appears  to  be  a  likely 
mechanism  for  down-regulation.  However,  reduction  of  receptor  biosynthesis,  an  important 
mechanism  in  the  down-regulation  of  P-adrenergic  receptors  (Collins  et  al.,  1992),  is  an 
alternative.  Indeed,  chronic  exposure  of  cortical  preparations  to  GABA  reduces  the  levels  of 
GABAa  receptor  a  1 ,  a2,  and  a3  subunit  mRN As  (Montpied  et  al.,  1 99 1 ;  Mhatre  and  Ticku, 
1994).  Studies  in  our  laboratory  are  in  accord  with  these  findings  (Baumgartner  et  al.,  1 994). 
Quantitative  RT-PCR  analysis  reveals  that  the  al,  p2,  p4,  yl,  and  y2  subunit  mRNAs  are  all 
reduced  by  a  similar  degree  (47-65%)  by  a  7-day  exposure  of  the  cells  to  GABA.  A  more 
detailed  examination  of  the  decline  of  the  al -subunit  transcript  revealed  that  no  significant 
change  was  produced  during  the  first  4  days  of  GABA  treatment.  However,  after  4  days  of 
exposure  there  is  a  50%  reduction  in  the  density  of  GABAa  ligand  binding.  Since  the 
attenuation  of  GAB  Aa  receptor  subunit  mRNAs  appears  to  be  a  relatively  slow  process  when 
compared  to  that  for  subunit  polypeptides  and  ligand  binding  sites,  we  propose  that 
translational  or  post-translational  mechanisms  are  responsible  for  the  initial  stages  of 
receptor  down-regulation.  The  studies  reported  here  suggest  that  agonist- induced  receptor 
sequestration  and  degradation  of  GABAa  receptor  subunits  from  the  neuronal  surface  may 
play  roles  in  this  process. 
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Human  leucocyte  collagenase  is  one  member  of  the  growing  protein  family  of  matrix 
metalloproteinases  (MMPs)  [Knauper  et  ah,  1990].  It  is  a  calcium-containing  Zn-endopro- 
teinase  (MMP-8)  that  cleaves  preferentially  interstitial  native  triple-helical  type  I  but  also 
type  II  and  type  III  collagen  into  one-quarter  and  three  quarter  fragments  of  the  native  chain 
length.  If  thus  differs  from  the  fibroblast  interstitial  collagenase  that  preferentially  cleaves 
type  III.  About  one-third  of  its  mass  of  65  kDa  (for  active  enzyme)  is  carbohydrates  in 
contrast  to  the  homologous  interstitial  collagenase  from  fibroblasts  which  carries  only  a 
small  carbohydrate  portion  [Tschesche  et  al.,  1992].  The  enzyme  is  stored  in  the  specific 
granules  of  granulocytes  and  is  released  as  a  proenzyme,  also  designated  latent  enzyme,  upon 
stimulation  of  the  cells  by  various  chemotactic  agents,  such  as  formylpeptides,  LTB4,  C5a, 
F2a  and  Zymosan  amongst  others,  [Tschesche  et  al.,  1 989  and  1991].  Extracellular  activation 
is  then  achieved  by  various  different  proteinases,  such  as  trypsin,  kallikrein,  chymotrypsin, 
cathepsin  G  [Tschesche  et  al.,  1992]  or  stromelysin  [Knauper  et  al.,  1993].  However,  the 
physiological  process  of  activation  is  not  yet  fully  understood,  since  activation  was  also 
observed  by  isolated  leucocyte  membranes  [Tschesche  unpublished]. 

The  enzyme  is  composed  of  a  multidomain  structure  as  are  the  other  members  of  the 
MMP  family.  The  hydrophobic  signal  peptide  sequence,  as  deduced  from  the  cDNA  se¬ 
quence,  is  not  present  in  the  secreted  proenzyme.  The  secretory  precursor  form  starts  with 
the  N-terminal  propeptide  domain  of  about  80  residues,  which  provides  latency  of  the 
enzyme.  The  following  domain  of  163  residues  bears  the  catalytic  machinery  with  the 
reactive  site  residues  and  the  zinc  binding  site.  A  hemopexin-like  C-terminal  domain  of  1 88 
residues  is  linked  by  a  1 6  residue  hinge  region  to  the  catalytic  domain,  which  was  shown  to 
be  crucial  for  the  substrate  specificity  of  the  leucocyte  collagenase  [Schnierer  et  al.,  1993]. 
While  the  truncated  catalytic  domain  itself  is  an  enzyme  exhibiting  substrate  specificity  for 
cleaving  peptides  [Diekmann  and  Tschesche,  1994]  and  globular  proteins,  such  as  the 
serpins,  a  1 -proteinase  inhibitor  [Knauper  et  al,  1990],  Cl -esterase  inhibitor,  and  a2-antiplas- 
min  [Knauper  et  al.,  1991],  it  has  no  helicase  activity  in  cleaving  triple-helical  type  I,  II  or 
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Secretion  stimulated  by  FNLPNTL 

Control  •  FNLPNTL  10-^M  ■  FNLPNTL  10^  M 


Figure  1.  Secretion  of  collagenase  from  1x10^  human  leucocytes  unstimulated  (-  - - )  and  stimulated 

by  10-^  M  (-■-■-)  and  lO'^  M  (-•-•-)  FNLPNTL. 
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Figure  2.  Schematic  representation  of  the  domain  structure  of  the  family  of  matrix  metalloproteinases 
(MMPs).  C  denotes  a  free  cysteine  residue  in  the  conserved  PRCGVPD  sequence  motif  responsible  for 
chelating  the  catalytic  zinc  and  maintaining  latency  of  the  proenzyme  form.  The  catalytic  zink  is  denoted  by 
Zn.  The  number  of  amino  acid  residues  per  domain  is  indicated  below  each  block. 
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Figure  3.  Sequence  of  amino  acid  residues  (one  letter  code)  in  the  full  length  leucocyte  collagenase.  Activation 
sites  by  stromelysin  (Phe^^-Gly^'^^  form)  and  by  serine  proteinases  (Met^'^-Gly^'^-  form)  are  indicated  as  are  the 
autocatalytic  cleavage  sites  separating  the  catalytic  and  the  hemopexin-like  domain. 


Ill  collagen.  Only  the  full  length  enzyme  cleaves  triple-helical  collagen  into  the  characteristic 
one-quarter  and  three-quarter  fragments  [Schnierer  et  al.,  1993]. 

Activation  of  the  latent  precursor  form  requires  removal  of  the  propeptide  domain, 
either  by  proteolytic  enzymes  or  by  autoactivation  after  molecular  rearrangement  [Knauper 
et.a.,  1990].  Depending  on  the  enzyme  used  for  activation  a  78  (stromelysin  activation)  or 
a  79  residue  (cathepsin  G  activation)  propeptide  is  cleaved  from  the  N-terminus.  The  single 
unpaired  Cys  of  the  strongly  conserved  PRCGVPD  sequence  motif  within  the  propeptide 
domain  is  assumed  to  provide  the  fourth  coordination  ligand  of  the  active  site  zinc.  Activation 
requires  replacement  of  the  coordinating  Cys  moiety  by  a  water  molecule.  This  opening  of 
the  reactive  site  induced  by  molecular  rearrangement  or  proteolysis  has  been  generally 
accepted  as  the  cysteine  switch  activation  hypothesis  [Van  Wart  and  Birkedal-Hansen,  1 990]. 
It  was  interesting  to  find  that  the  stromelysin  activated  enzyme  with  N-terminal  Phe^^  was 
about  three  to  four  times  more  active  than  the  trypsin,  chymotrypsin  or  cathepsin  G  activated 
forms  with  N-terminal  Met^*^  [Knauper  et  al.,  1993]. 

The  three-dimensional  structure  of  the  catalytic  domain  of  human  leucocyte  intersti¬ 
tial  collagenase  was  solved  at  2.0  A  after  crystallisation  of  the  recombinant  protein  expressed 
in  E.coli  [Bode  et  al.,  1994].  The  spherical  molecule  contains  a  flat  active  site  cleft  separating 
the  smaller  C-terminal  part  from  the  larger  N-terminal  part,  which  is  built  of  a  central,  highly 
twisted  five-stranded  p-sheet,  flanked  by  an  S-shaped  double  loop  and  two  additional 
bridging  loops  on  its  convex  side  and  two  long  a-helices  on  its  concave  side.  The  catalytic 
zinc  ion  is  located  at  the  bottom  of  the  active  site  cleft  and  is  coordinated  by  the  N-atoms  of 
the  three  His  within  the  His'^'^-Glu’^^-X-X-His^^kX-X-Gly^^'^-X-X-His^®'^  zinc  binding 
consensus  sequence.  The  active  site  helix  contains  His*^^,  Glu'^^  and  His^^'  and  extends  to 
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where  the  polypeptide  chain  turns  away  from  the  helix  axis  towards  the  third  zinc 
ligand,  His^^^.  Besides  the  “catalytic”  zinc  ion  a  second  “structural”  zinc  ion  is  sandwiched 
between  the  surface  S-shaped  double  loop  Arg*"^^ — ^Leu*^^  and  the  surface  of  the  P-sheet.  It 
is  tetrahedrally  coordinated  by  His'^^,  Asp^'^^  His^^^  and  His*^^  while  a  structural  calcium 
ion  is  octahedrally  coordinated  by  Asp'^\  Gly*^^,  Asn^^"^,  Ile'^^,  Asp’^^  and  Glu’^°.  A  second 
structural  calcium  ion  is  located  on  the  convex  side  of  the  p-sheet.  It  is  also  octahedrally 
coordinated  by  Asp'^^,  Gly‘^^,  Gly*^',  Asp^^^  and  two  water  molecules. 

The  small  C-terminal  domain  exhibits  a  largely  irregular  folding  with  a  right-handed 
loop  followed  by  an  a-helix.  The  loop  is  stabilised  by  a  tight  1 .4  turn  Ala^'^-Leu-Met-Tyr^'^ 
known  as  the  “Met  turn”  [Gomis-Ruth  et  al.,  1993;  Bode  et  al.,  1993],  a  conserved 
topological  element  in  the  “metzinkins”  [Bode  et  al.,  1993]  providing  a  hydrophobic  base 
for  the  catalytic  zinc  ion  and  the  three  His  residues  which  ligate  the  catalytic  zinc. 

For  stabilisation  of  the  proteinase  catalytic  domain  a  zinc  chelating  inhibitor,  Pro- 
Leu-Gly-hydroxamate,  was  co-crystallised.  The  inhibitor  lies  antiparallel  to  the  edge  strand 
(P4)  with  Pro''  residing  in  a  hydrophobic  groove  formed  by  the  side  chains  of  His' Phe'^"^ 
and  Ser'^';  Leu'^  forms  two  inter-main  chain  hydrogen  bonds  to  Ala'^^  and  its  side  chain  is 
situated  in  a  small  opening  lined  by  His^^',  Ala^^^  and  His^^^,  while  Gly-NHOH  is  oriented 
towards  Glu'^^  with  the  carbonyl  oxygen  and  the  hydroxyl  oxygen  complexing  the  “cata¬ 
lytic”  zinc.  A  characteristic  feature  of  the  X-ray  structure  of  the  Met^^-Gly^''^  catalytic  domain 
of  human  leucocyte  collagenase  is  that  electron  density  is  observed  only  from  the  seventh 
residue  Pro  86  onwards.  While  the  structure  of  the  Phe^^  variant  reveals  that  the  N-terminal 
heptapeptide  segment,  Phe^^-Met-Leu-Thre-Pro-Gly-Asn^^  binds  to  the  concave  surface  at 
the  bottom  of  the  molecule  between  Pro^^  and  Ser^^^  [Reinemer  et  al.,  1 994].  The  side  chain 
of  Trp^^  slots  into  a  hydrophobic  groove  formed  by  the  side  chains  of  Leu”,  Ile'^^,  Pro'^^ 
and  Gly'^^  and  Thr^^  fits  into  a  small  hydrophobic  groove  lined  by  Pro^^,  Gln'^^,  Gly^^^  and 
Ala^^^.  N-terminal  to  Thr^^  the  chain  turns  downward  to  the  C-terminal  helix  aC  crossing 
over  with  the  strand  following  helix  aB  at  residue  Leu^^^  to  form  the  only  regular  inter-main 
chain  hydrogen  bond. 

The  N-terminal  ammonium  group  of  Phe^^  forms  a  salt  bridge  with  the  carboxylate 
moiety  of  the  strictly  conserved  Asp^^^  (Fig.  2a  and  2b).  The  side  chains  of  Leu^'  and  Phe^^ 


Figure  4.  Representation  of  the  leuco¬ 
cyte  collagenase  catalytic  domain  in  a 
ribbon  plot  structure  with  the  two  zinc 
and  the  two  calcium  atoms.  In  the 
Met®^-Gly^'’^-form  the  N-terminal  hep¬ 
tapeptide  FMLTPGN  prior  to  Pro^^  is 
disordered  but  packs  against  a  concave 
hydrophobic  surface  of  the  enzyme 
made  by  the  C-terminal  helix  in  the 
Phe^^-Gly^'*^  form  (see  text). 
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are  oriented  towards  the  C-terminal  helix  with  the  former  packing  against  the  side  chains  of 
(aC)  and  Asn^^  and  the  latter  packing  against  Gly^^^  (which  is  strictly  conserved)  and 
Ala^^^  of  the  C-terminal  helix  aC,  while  the  side  chain  of  Met^^  points  towards  the  bulk 
solvent  and  seems  to  be  disordered,  as  no  significant  electron  density  could  be  observed  for 
this  residue.  Interestingly,  the  N-terminus  locks  two  water  molecules  in  a  cavity  created  by 
the  C-terminal  helix  and  the  strand  following  the  active-site  helix  and  lined  by  the  side  chains 
of  VaP^  Trp'^^,  Met^^^,  Asp^^^,  Asp^^^  and  Gly^^^.  Both  internal  solvent  molecules  are 
hydrogen  bonded  to  Asp^^^  O51  and  to  Met^°  NH. 

The  disorder-order  transition  of  the  N-terminal  segment  in  the  two  structural  forms 
(i.e.  the  Phe^^-Gly^"^^  and  the  MeP-Gly^"^^  catalytic  domain)  must  in  some  way  be  significant 
to  activity  enhancement.  The  formation  of  the  salt-bridge  between  the  N-terminal  ammonium 
group  of  Phe^^  and  the  side  chain  carboxylate  of  Asp^^^  seems  to  lead  to  stabilisation  of  the 
active  site  via  the  neighbouring  Asp^^^.  The  latter  residue,  which  is  strictly  conserved  as  is 
Asp^^^,  has  its  side  chain  buried  and  forms  a  hydrogen  bond  to  the  Met  turn  residues  Leu^'^ 
N  and  Met^^^  N,  thus  stabilising  the  active  site  basement.  This  is  in  accordance  with  the 
finding  that  this  residue  is  essential  for  catalytic  activity  [Hirose  et  al.,  1993].  Stabilisation 
of  the  active  site  might  be  a  prerequisite  for  that  of  the  transition  states. 

The  specific  ’triple-helicase’  activity  of  the  full  length  enzyme  containing  the 
C-terminal  hemopexin-like  domain  obviously  requires  at  least  partial  unfolding  of  the 
collagen  triple  helix  around  the  active  site  [Bode  et  al.,  1994].  The  repetitive  Pro-X-Gly 
segment  of  one  strand  of  a  regular  collagen  triple-helix  could  probably  be  arranged  in  such 
a  way,  that  the  glycyl  carbonyl  group  approached  the  catalytic  zinc.  However,  the  Pi  ’-proline 
side  chain  would  then  not  adequately  fill  the  Si  ’-subsite  pocket  of  the  enzyme  and  the  15  A 
diameter  collagen  triple  helix  [Yonath  and  Traub,  1969;  Fraser  et  al.,  1979]  would  not  fit 
properly  through  the  opening  at  the  S2’  and  S3’  subsites. 

Elucidation  of  the  X-ray  structure  of  the  leucocyte  interstitial  collagenase  catalytic 
domain  certainly  allows  a  better  understanding  of  the  catalytic  properties  of  the  enzyme  and 
facilitates  a  design  for  an  alignment  of  small  molecular  weight  enzyme  inhibitors  [Grams  et 
al.,  1994]. 
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INTRODUCTION 

Agents  that  cause  damage  to  DNA  (DNA  damage-  inducing  (DDI)  agents)  arrest  cell 
cycle  progression  in  all  eukaryotes  from  yeast  to  humans  at  positions  in  late  Gj  and  G2  that 
have  become  known  as  “checkpoints”  (Hartwell  and  Weinert,  1989;  Murray,  1992;  Sheldrick 
and  Carr,  1993;  Weinert  and  Lydall,  1993),  presumably  to  allow  time  for  DNA  repair. 
Otherwise  the  DNA  damage  would  become  irreversibly  fixed  as  a  consequence  of  DNA 
replication  in  S  phase,  or  through  cell  division  at  mitosis  (M  phase).  The  mammalian  G2 
checkpoint  mechanism  is  not  yet  well  characterized  (O’Connor  and  Kohn,  1992),  but  the 
key  observation  that  tumor  cells  with  mutant  p53  were  unable  to  arrest  in  G)  (Kastan  et  al., 
1991)  quickly  led  to  an  outline  of  the  mammalian  G|  checkpoint  mechanism  (Hunter,  1993; 
Appella  and  Anderson,  1994;  Appella  et  al.,  this  volume).  The  p53  tumor  suppressor  gene 
is  a  transcription  factor  that  normally  is  relatively  inactive  because  it  is  rapidly  degraded 
(Levine,  1993).  However,  in  response  to  exposure  to  ultraviolet  light,  ionizing  radiation,  and 
other  DDI  agents,  the  p53  protein  is  transiently  stabilized,  accumulates  in  the  cell  nucleus, 
and  induces  the  expression  of  several  genes  including  WAFl  and  GADD45  (El-Deiry  et  al., 
1993;  El-Deiry  et  al.,  1994).  The  21  kDa  product  of  WAFl  is  a  potent  inhibitor  of  the 
cyclin-dependent  protein  kinases  that  are  needed  for  the  transition  from  Gi  to  S  phase  and 
for  continued  DNA  replication  in  S  (Duli'c  et  al,  1994;  Harper  et  al.,  1993).  Although  the  Gj 
checkpoint  mechanism  probably  is  much  more  complex,  the  induction  of  WAFl  provides  a 
simple  explanation  of  how  cell  cycle  progression  can  be  arrested.  In  addition  to  WAFl,  about 
50  other  genes  are  known  to  be  induced  in  mammalian  cells  after  exposure  to  DDI  agents 
(Fornace,  1992;  Herrlich  and  Rahmsdorf,  1994).  Recent  studies  indicate  that  some  genes  are 
induced  as  a  consequence  of  the  effects  of  DDI  agents  on  other  cellular  molecules  and  not 
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necessarily  as  a  consequence  of  damage  to  DNA  (Anderson,  1 994;  Herrlich,  and  Rahmsdorf, 
1994;  Sachsenmaier  et  al.,  1994).  Such  exposures  activate  cytoplasmic  signaling  mecha¬ 
nisms  that  operate  through  protein  kinase  cascades  initiated  at  or  near  the  plasma  membrane; 
in  turn,  these  kinase  cascades  activate  several  transcription  factors  including  API  and 
NF-kB.  Nevertheless,  there  is  strong  evidence  that  the  p53-dependent  induction  of  WAFl  is 
a  direct  consequence  of  the  production  of  DNA  strand  breaks  (Nelson  and  Kastan,  1 994)  and 
that  DNA  strand  breaks  are  the  signals  for  activation  of  the  G]  checkpoint(s)  in  yeast  (Siede 
etal.,  1994). 

Although  much  has  been  learned  about  the  structure  and  function  of  p53  and  the 
probable  sequence  of  subsequent  events  that  lead  to  cell  cycle  arrest,  little  is  known  about 
how  DNA  damage  is  detected  and  the  nature  of  the  signal  that  is  generated  by  DNA  damage. 
Circumstantial  evidence  suggests  that  protein  kinases  may  be  involved.  Indeed,  several  yeast 
kinase  genes  were  identified  by  screening  for  mutants  defective  in  their  ability  to  arrest  cell 
cycle  progression  after  exposure  to  DDI  agents  (Walworth  et  al.,  1993;  Weinert  et  al.,  1994; 
Allen  et  al.,  1994;  Anderson,  1994).  In  mammalian  cells,  the  situation  is  less  clear;  however, 
in  hamster  cells,  2-aminopurine  overrides  the  G|,  S,  and  G2  checkpoints,  and  2-aminopurine 
and  H7,  another  protein  kinase  inhibitor,  block  DDI-agent  induction  of  the  GADD45  gene 
(Andreassen  and  Margolis,  1992;  Luethy  and  Holbrook,  1994). 

Two  moderately  abundant  nuclear  enzymes  have  been  described  in  mammalian  cells 
that  recognize  DNA  strand-breaks  and  transmit  signals  to  other  proteins.  One  is  poly(ADP- 
ribose)  polymerase  (de  Murcia  and  de  Murcia,  1994;  Satoh  and  Lindahl,  1992);  this  enzyme 
is  activated  by  binding  to  nicks,  and  it  ribosylates  histones,  other  chromosomal  proteins,  and 
itself.  It  may  be  responsible  for  altering  chromatin  structure  near  sites  of  DNA  damage  and 
also  may  signal  the  presence  of  damage  through  transient  changes  in  NAD  levels.  A  second 
DNA  structure-signaling  enzyme  is  DNA-PK  (Anderson,  1993).  DNA-PK  is  activated  by 
binding  to  DNAs  with  nicks,  gaps,  or  double-strand  breaks,  and  it  may  function,  at  least  in 
part,  as  a  detector  of  DNA  strand-  breaks. 

THE  STRUCTURE  OF  DNA-PK 

DNA-PK  is  believed  to  consist  of  a  very  large  polypeptide,  DNA-PK^  (or  Prkdc  for 
protein  ^nase,  Z)NA-activated,  catalytic  component),  that  probably  contains  the  catalytic 
site,  and  a  DNA  binding/targeting  and  regulatory  subunit,  which  can  be  the  Ku  autoantigen 
(Dvir  et  al.,  1992;  Gottlieb  and  Jackson,  1993;  Anderson,  1993).  The  large  DNA-PK^. 
polypeptide  and  DNA-activated  kinase  activity  co-purify  (Lees-  Miller  et  al.,  1990;  Carter 
et  al.,  1990).  The  size  of  the  DNA-PK^  polypeptide  was  initially  estimated  from  SDS- 
polyacrylamide  gels  electrophoresis  to  be  300-350  kDa;  however,  the  size  of  the  nascent 
polypeptide,  estimated  from  preliminary  sequence  analysis  of  the  --13  kbp  cDNA,  is  close 
to  450  kDa.  The  difference  is  attributable  to  the  lack  of  good  molecular  weight  markers  in 
this  size  range.  Although  nearly  ten  times  the  size  of  many  protein  kinase  catalytic  subunits, 
several  findings  are  consistent  with  an  assignment  of  the  catalytic  site  to  the  450  kDa 
DNA-PKc  polypeptide.  DNA-PK^  binds  ATP  and  can  be  labeled  by  the  ATP  analogues 
fluorosulfonylbenzoyladenosine  (FSB  A)  and  azido-ATP;  FSB  A  inhibits  DNA-PK  kinase 
activity  (Lees-Miller  et  al.,  1990).  A  monoclonal  antibody  specific  for  the  DNA-PK^. 
polypeptide  depleted  DNA-dependent  kinase  activity  from  HeLa  extracts  (Carter  et  al., 
1990).  Finally,  sequence  analysis  of  the  cDNA  has  revealed  a  segment  with  homology  to 
other  kinase  catalytic  domains.  In  the  human  cell  lines  that  have  been  examined,  DNA-  PK^ 
is  moderately  abundant  and  predominantly  nuclear  (Anderson  and  Lees-Miller,  1992).  We 
estimated  its  abundance  to  be  about  50,000  molecules  per  cell,  but  our  estimate  involved 
several  assumptions  and  an  accurate  measurement  has  not  been  made.  DNA-  PK  activity  is 
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approximately  1 00-fold  more  abundant  in  extracts  of  human  and  monkey  cell  lines  compared 
to  extracts  of  rodent  and  insect  cell  lines;  thus,  DNA-PK  activity  and  the  DNA-PK^ 
polypeptide  are  difficult  to  detect  with  current  assays  in  unfractionated  extracts  of  non-pri- 
mate  cells  (Anderson  and  Lees-  Miller,  1992). 

Ku  was  first  recognized  as  a  heterodimeric  (p70/p80),  nuclear,  phosphoprotein  that 
reacted  with  sera  from  patients  suffering  from  the  autoimmune  diseases  lupus  erythematosus 
and  scleroderma  polymyositis  (Mimori  et  ah,  1981;  Reeves,  1992).  HeLa  cells  contain 
~400,000  molecules  of  Ku  per  cell  (Mimori  et  ah,  1986),  but  Ku  also  appears  to  be  less 
abundant  in  non-  primate  cells  and  was  not  detected  in  mouse  L-929  cells  using  mouse-spe¬ 
cific  monoclonal  antibodies  (Wang  et  ah,  1993).  In  vitro  Ku  binds  initially  to  the  ends  of 
linear  DNA  fragments  but  then  can  translate  along  the  DNA  in  an  ATP -independent  manner 
(de  Vries  et  ah,  1989).  Ku  also  recognizes  DNAs  with  nicks  and  gaps,  as  well  as  DNAs  with 
single-  to  double-strand  transitions  (Blier  et  ah,  1993;  Falzon  et  ah,  1993),  and  these 
structures  activate  DNA-PK  (Morozov  et  ah,  1994),  cDNAs  for  the  two  Ku  polypeptides 
have  been  cloned  and  sequenced  from  both  human  and  mouse  cells  (Chan  et  ah,  1988;  Reeves 
and  Sthoeger,  1989;  Yaneva  et  ah,  1989;  Mimori  et  ah,  1990;  Griffith  et  ah,  1992;  Porges  et 
ah,  1990;  Falzon  and  Kuff,  1992),  and  the  human  genes  recently  were  mapped  to  chromo¬ 
somes  2  (p80)  and  22  (p70)  (Cai  et  ah,  1994).  The  location  of  the  p80  Ku  subunit  gene 
corresponds  to  the  location  of  the  human  gene  {XRCC5)  that  complements  ionizing-radiation 
sensitivity  in  group  5  hamster  cells.  Transfection  of  the  cDNA  for  the  human  Ku  p80  subunit 
into  group  5  hamster  cells  that  are  defective  in  repairing  double-strand  breaks  restored  their 
X-ray  sensitivity  to  normal  levels  and  corrected  the  defect  in  site-specific  recombination 
(Rathmell  and  Chu,  1994a,b;  Getts  and  Stamato,  1994;  Taccioli  et  ah,  1994).  Thus,  Ku  is 
likely  to  play  a  role  in  these  processes.  It  may  protect  DNA  ends  from  exonucleolytic 
degradation;  however,  by  activating  DNA-PK,  it  also  might  have  other  signaling  functions. 
Recently,  Ku  was  shown  to  have  DNA  helicase  activity  and  to  be  identical  to  a  previously 
described  activity,  human  DNA  helicase  II  (Tuteja  et  ah,  1994);  thus,  Ku  probably  also 
performs  functions  that  are  independent  of  DNA-PK^.  It  is  not  known  if  other  targeting 
proteins  can  substitute  for  Ku  to  activate  DNA-  PK^  perhaps  in  response  to  other  signals. 


DNA-PK  SUBSTRATE  SPECIFICITY 

In  vitro,  human  DNA-PK  phosphorylates  a  variety  of  nuclear  DNA-binding,  regula¬ 
tory  proteins  including  the  tumor  suppressor  protein  p53,  the  single-  stranded  DNA  binding 
protein  RPA,  the  heat  shock  protein  hsp90,  the  large  tumor  antigen  (TAg)  of  simian  virus 
40,  a  variety  of  transcription  factors  including  Fos,  Jun,  serum  response  factor  (SRF),  Myc, 
Spl,  Oct-1,  TFIID,  E2F,  the  estrogen  receptor,  and  the  large  subunit  of  RNA  polymerase  II 
(reviewed  in  Anderson,  1993;  Jackson  et  ah,  1993).  However,  for  most  of  these  proteins,  the 
sites  that  are  phosphorylated  by  DNA-PK  are  not  known. 

To  determine  if  the  sites  that  were  phosphorylated  in  vitro  also  were  phosphorylated 
in  vivo  and  if  DNA-PK  recognized  a  preferred  protein  sequence,  we  identified  the  sites 
phosphorylated  by  DNA-PK  in  several  substrates  by  direct  protein  sequence  analysis. 
Table  1  shows  an  alignment  of  known  DNA-PK  phosphorylation  sites.  Each  phosphorylated 
serine  or  threonine  is  followed  immediately  by  glutamine  in  the  polypeptide  chain;  at  no 
other  positions  are  the  amino  acid  residues  obviously  constrained. 

Two  forms  of  hsp90,  designated  a  and  p,  are  found  in  human  cells,  and  these  proteins 
are  97%  identical;  however,  only  hsp90  a  is  phosphorylated  by  DNA-PK  (Lees-Miller  and 
Anderson,  1989b).  The  phosphorylated  sites  are  two  threonines  at  the  amino  terminus  of 
hsp90  a  in  the  sequence  PEETQTQDQPM^  * ;  these  residues  are  not  present  in  hsp90  p.  SV40 
TAg  was  shown  to  be  phosphorylated  at  four  sites,  serines  120,  665,  667,  and  677  (Chen  et 
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Table  1.  Protein  phosphorylation  sites  recognized  by  human  DNA-PK 


Substrate  protein* 

DNA-PK 

Site 

Local  Amino  Acid  Sequence 

Hsp90[a]  (human) 

Thr4 

-  *  - 

P-E-E-T-Q-T-Q-D'Q-P-M-E-E^^ 

Thr6 

P-E-E-T-Q-T-Q-D-Q-P-M-E-E-E-E^^ 

SV40  Large  tumor  antigen 

Serl20 

e-a-t-a-d-s-q-h-s-t-p-p-k-k-k^29 

Ser665 

E-T-G-I-D-S'Q~S'Q'G-S-F-Q-A-P®^'^ 

Ser667 

G-I-D-S-Q-S-Q-G-S-F-Q-A-P-Q-S^”^^ 

Ser677 

Q-A-P-Q-S-S-Q-S-V-H-D-H-N-Q-P^®^ 

c-Jun  transcription  factor  (human) 

Ser249 

P-I-D-M-E-S-Q-E-R  -  I-K-A-E-R-K^^® 

Serum  response  factor  (human) 

Ser435 

V-L'N-A-F-S-Q-A-P'S-T-M-Q-V- 

p53  Tumor  suppressor 

Ser446 

m-q-v-s-h-s-q-v-q-e-p-g-g-v-p^^^ 

(mouse) 

Ser4 

M-E-E-S-Q-S-D-I-S-L-E-L-P^^ 

(mouse) 

Serl5 

L-E-L-P-L-S-Q-E-T-F-S-G-L-W-K^^ 

(human) 

Seri  5 

V-E-P-P-L-S-Q-E-T-F-S-D-L-W-K^'^ 

-  *  - 

^Phosphorylation  sites  (-*-)  were  identified  by:  hsp90,  Lees-Miller  and  Anderson  (1989a);  SV40 
TAg,  Chen  et  al.  (1991);  c-Jun,  Bannister  et  al.  (1993);  serum  response  factor  (SRF),  Liu  et  al. 
(1993),  and  p53  (Lees-Miller  et  al.  (1992). 


al.,  1991).  Isolation  of  the  phosphopeptides  containing  these  residues  was  accomplished 
using  iron-affinity  chromatography  in  conjunction  with  conventional  reverse  phase  HPLC. 
The  phosphorylated  serines  then  were  identified  by  direct  sequence  analysis  after  converting 
the  phosphoserine  to  S-  ethylcysteine.  Each  phosphorylated  TAg  serine  is  followed  imme¬ 
diately  by  glutamine  (Table  1).  Serines  120  and  677  are  phosphorylated  in  vivo,  but  serines 
665  and  667  have  not  been  shown  to  be  in  vivo  sites  of  phosphorylation.  TAg  serine  639  is 
phosphorylated  in  vivo,  and  although  this  serine  is  followed  by  glutamine  in  the  TAg 
polypeptide,  it  was  not  phosphorylated  by  DNA-PK  in  vitro.  This  finding  suggests  that  either 
Ser639  is  phosphorylated  by  a  different  kinase  or  that  its  phosphorylation  by  DNA-PK 
requires  a  particular  TAg  conformation  that  was  not  present  in  the  in  vitro  reaction. 

Subsequent  to  our  initial  work,  Bannister  et  al.  (1993)  used  a  genetic  approach  to 
identify  serine  249  in  the  DNA-binding  region  of  Jun  as  the  likely  site  of  phosphorylation 
by  DNA-PK.  This  site  is  phosphorylated  in  vivo,  but  it  also  can  be  phosphorylated  in  vitro 
by  casein  kinase  II.  Jun  residue  250  is  glutamine;  changing  it  to  alanine  largely  prevented 
DNA-dependent  Jun  phosphorylation,  again  suggesting  that  glutamine  is  important  for 
substrate  recognition  by  DNA-PK.  Changing  glutamic  acid  25 1  to  alanine  decreased  the  rate 
of  Jun  phosphorylation  about  twofold,  indicating  that  the  residue  in  this  position  also  may 
contribute  to  substrate  recognition.  Glutamic  acid  also  is  present  at  the  +2  position  with 
respect  to  the  serine  15  site  of  human  and  mouse  p53,  and  aspartic  acid  (D)  is  present  at  +2 
after  the  second  TQ  site  in  hsp90  (Table  1 ,  see  below),  but  neither  glutamic  or  aspartic  acid 
are  at  this  position  in  sites  from  SV40  TAg,  or  the  human  serum  response  factor  (SRF).  Two 
serines  followed  by  glutamine  in  a  peptide  derived  from  the  carboxy-  terminal  transactivation 
domain  of  SRF  are  phosphorylated  by  DNA-  PK,  and  they  also  appear  to  be  phosphorylated 
in  serum-stimulated  cells  (Liu  et  al.,  1993). 

To  determine  whether  an  adjacent  glutamine  is  important  for  phosphorylation  site 
recognition  by  DNA-PK,  we  examined  its  ability  to  phosphorylate  synthetic  peptides 
corresponding  to  segments  of  the  human  p53  protein  sequence.  These  peptides  covered  all 
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of  the  known  phosphorylation  sites  in  human  p53,  including  serines  9,  15,  33,  315,  and  392, 
and  all  -SQ-  or  -QS-  sites  (i.e.  S15,  -LSQE-;  S37,  -PSQA-,  S99,  -PSQK-,  S166,  QSQH,  and 
S376,  GQSTS)  (Lees-Miller  et  al.,  1992).  Peptides  containing  three  of  the  five  SQ  or  QS 
motifs  were  phosphorylated  by  DNA-PK;  one  of  these,  serine  15,  is  in  a  highly  conserved 
region  and  is  phosphorylated  in  vivo  (Ullrich  et  al.,  1993).  The  sequence  requirements  for 
phosphorylation  at  this  site  were  examined  in  more  detail  using  a  series  of  synthetic  peptides 
(Table  2).  Shortening  the  sequence  to  less  than  six  residues  on  the  carboxy-terminal  side 
dramatically  reduced  the  rate  of  peptide  phosphorylation,  but  shortening  the  sequence  on 
the  amino-terminal  side  actually  increased  the  rate.  Changing  the  glutamine  (Q)  at  the 
position  corresponding  to  p53  residue  16  to  tyrosine  (Y),  asparagine  (N),  or  lysine  (K) 
decreased  the  rate  of  peptide  phosphorylation,  and  inverting  the  glutamine  and  the  following 
glutamic  acid  (SQE  ->  SEQ)  essentially  abolished  peptide  phosphorylation.  Km  values  for 
the  peptide  substrates  varied  from  about  0.2  to  0.7  mM;  these  values  are  not  remarkably  good 
compared  to  peptide  substrates  for  several  other  kinases  (Kemp  and  Pearson,  1991).  A 
glutamine  or  glutamic  acid  immediately  before  the  SQ  motif  and  a  glutamic  acid  immediately 
following  it  (i.e.  QSQE  or  ESQE,  as  in  the  c-Jun  site,  see  above)  gave  a  slight  improvement 
in  the  apparent  association  constant  (compare  peptides  4,  11,  and  15  in  Table  2);  about  a 
twofold  further  improvement  was  obtained  by  removing  all  but  two  residues  from  the 
amino-terminal  side  of  the  phosphorylation  site  (Table  1,  Figure  1).  We  also  noticed  that  the 
version  of  peptide  4  (EPPLSQEAFADLWKK)  ending  with  an  amide  is  a  slightly  better 
substrate  than  the  same  peptide  ending  with  a  carboxyl  group,  perhaps  suggesting  that  the 
carboxy-terminal  extension  of  six  residues  is  not  quite  optimal.  To  date,  the  best  substrate 
peptide  for  human  DNA-PK  is  P£'*S'2£'AFADLWKK,  while  the  similar  peptide  VESEQA- 
FADLWKK  is  not  appreciably  phosphorylated  (Table  1,  Figure  1). 

A  second  substrate  determinant  for  recognition  by  DNA-PK  is  the  ability  to  bind 
DNA.  Jackson  et  al.  first  observed  that  quantitative  phosphorylation  of  Spl  required  a  DNA 
template  with  a  GC-box  DNA  binding  element  (Jackson  et  al.,  1990).  Subsequently  it  was 
found  that  efficient  phosphorylation  required  that  both  DNA-PK  and  Spl  be  bound  to  the 
same  DNA  molecule  (Gottlieb  and  Jackson,  1993).  Similar  findings  for  murine  p53  and  a 
protein  containing  the  POU  DNA  binding  domain  of  human  Oct- 1  were  made  by  Lees-Miller 
and  Anderson  (Lees-Miller  et  al.,  1992;  Anderson  and  Lees-Miller,  1992).  DNA  both 
activates  DNA-PK  and  increases  the  local  concentration  of  substrate  in  the  vicinity  of 
activated  kinase.  Short  duplex  oligonucleotides  that  bound  kinase  or  the  substrate  but  could 
not  bind  both  simultaneously  gave  much  lower  rates  of  phosphorylation  for  p53  or  the  POU 
domain  protein  than  long  DNA  fragments;  however,  the  rates  of  phosphorylation  of  peptide 
substrates  or  of  hsp90,  a  protein  that  does  not  bind  DNA,  were  independent  of  DNA  length 


Figure  1.  Phosphorylation  of  synthetic  peptides  by  human  DNA-PK.  Rates  of  peptide  phosphorylation  are 
shown  as  a  function  of  peptide  concentration.  Reactions  were  at  30°C  for  10  min  as  described  (Lees-Miller  et 
al.,  1992);  the  data  represent  an  average  of  three  inde¬ 
pendent  experiments  for  each  peptide.  Apparent  Km 
and  Vmax  values  were  determined  from  Lineweaver- 
Burk  plots.  These  values  were  respectively:  (■) 

PESQEAFADLWKKcooh,  0.20  mM,  470  nmol/ 
min/mg;  (T)  PEESQEAFADLWKKcooh,  0.27  mM, 

410  nmol/min/mg;  (•)  PEESQEAFADLWKKa,„ide, 

0.4  mM,  370  nmol/min/mg.  Note  that  the  Km  value 
for  (•)  is  about  half  that  of  its  carboxyl  group  termi¬ 
nated  equivalent  peptide  (Table  2,  peptide  4).  Peptides 
EPPL5E0AFDLWKK  and  PESEQAFADLWKK 
were  not  significantly  phosphorylated  (data  not 
shown). 
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Table  2.  Phosphorylation  of  synthetic  p53  peptide  substrates  by  human  DNA-PK 


Pept. 

no. 

Peptide  sequence 

Apparent 

Km 

(mM) 

Apparent 

Vmax 

(nmo!/min/mg) 

Activity 
P04/min/mg  at 
0.2  mM 

p53 

11 

15 

20 

24 

1 

E 

P 

P 

L 

S 

Q 

E 

T 

F  S  D 

L  W 

K- 

-K 

0.35 

360 

130 

2 

- 

- 

- 

- 

S 

Q 

- 

- 

-  KK 

ND 

ND 

15 

3 

- 

- 

- 

- 

S 

Q 

- 

- 

.  .  . 

KK 

ND 

ND 

12 

4 

- 

- 

- 

- 

S 

Q 

- 

A 

-  A  - 

-  - 

- 

K 

0.76 

380 

83 

5 

- 

- 

- 

- 

S 

Q 

- 

A 

-  A  - 

-  L 

- 

K 

0.56 

160 

36 

6 

- 

- 

- 

- 

S 

Y 

- 

A 

-  A  - 

-  L 

- 

K 

ND 

ND 

2.3 

7 
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Dash  (-)  indicates  amino  acid  is  same  as  in  control  peptide  #1,  p53(ll-  24)K;  ND  -  not  determined. 
Phosphorylation  reactions  were  performed  as  described  in  Lees-Miller  et  al.  (1992). 


or  concentration.  Thus,  DNA  length  (above  18  bp)  did  not  affect  kinase  activation,  nor  did 
the  interaction  of  the  substrates  with  non-specific  DNA  sequences  change  their  conformation 
in  a  manner  that  significantly  affected  the  rate  of  phosphorylation. 

It  seems  likely  that  DNA  binding  also  may  be  important  for  substrate  recognition  in 
vivo.  Many  site-specific  DNA  binding  proteins,  including  p53  and  SV40  TAg,  bind  DNA  in 
a  non-sequence-  specific  manner,  and  this  ability  may  help  proteins  scan  chromatin  for  their 
sequence-specific  recognition  sites.  Presumably  activated  DNA-PK  is  fixed  in  space  by 
binding  to  chromatin,  although  it  may  be  able  to  slide  in  either  direction  from  the  initial 
binding  site.  Thus,  we  imagine  that  substrates  become  phosphorylated  as  they  scan  along 
chromatin  strands  via  their  non-sequence  specific  DNA  binding  mode  and  collide  with 
DNA-PK.  Since  both  substrate  and  kinase  will  be  physically  constrained  if  both  are 
associated  with  the  same  chromatin  segment,  the  specific  location  of  potential  sites  within 
the  three  dimensional  structure  of  the  substrate  also  may  be  important.  Most  proteins  that 
function  as  DNA-PK  substrates  in  vitro  are  DNAbinding  proteins  (Anderson,  1 993).  If  DNA 
binding  ability  accounts  for  a  substantial  fraction  of  substrate  specificity  and  recognition, 
then  DNA-PK  may  not  have  a  highly  specific  sequence  or  structure  recognition  ability.  This 
situation  would  account  for  the  relatively  poor  Km  values  that  have  been  measured  for 
substrate  peptides. 

If  chromatin  binding  plays  an  important  role  in  DNA-PK  substrate  recognition,  then 
reaction  conditions  in  the  test  tube  are  likely  to  be  far  removed  from  those  within  the  nucleus 
of  a  cell.  Thus,  some  putative  substrates  or  sites  that  are  phosphorylated  in  vitro  may  not  be 
phosphorylated  in  vivo.  One  apparent  example  is  the  Oct-1  POU  expression  construct 
mentioned  earlier.  This  protein  (T7HPOU1)  consists  of  the  POU  domain  from  human  Oct-1 
with  a  17  amino  acid  amino-terminal  extension  derived  from  vector  sequences  (Figure  2). 
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5  10  15  20  25  30 

IMASMTGHHHHHHGMSGGMEEPSDLEELEQF 
31AKTFKQRRIKLGFTQGDVGLAMGKLYGNDF 
61SQTTISRFEALNLSFKNMCKLKPLLEKWLN 
91DAENLSSDSSLSSPSALNSPGIEGLSRRRK 
121  KRTSIETNIRVALEKSFLENQKPTSEEITM 
151  lADQLNMEKEVIRVWFCNRRQKEKRINP* 

Figure  2.  Sequence  of  the  Oct-l/Pou  expression  construct  TTHPOUl.  The  predicted  amino  acid  sequence  of 
the  human  Oct- 1  POU-domain  construct  from  plasmid  pT7HPOU  1  (Anderson  and  Lees-  Miller,  1 992)  is  given 
in  the  single  letter  code.  The  POU  domain  extends  from  residue  19  to  178;  residues  1  to  18  are  derived  from 
vector  sequences  and  include  a  six-histidine  tag  for  affinity  purification.  The  POU  domain  has  a  TQ  motif  at 
residues  44/45  and  a  SQ  motif  at  residues  61/62  (bold).  Serine  3  (bold)  is  phosphorylated  by  DNA-PK  (see 
text);  the  initiating  methionine  (residue  1)  is  removed  in  E.  coli. 


The  POU  domain  has  two  sites  that  resemble  other  identified  DNA-PK  phosphorylated  sites, 
a  TQ  at  residues  44/45  and  an  SQ  at  residues  61/62.  Phosphoamino  acid  analysis  revealed 
only  phosphoserine  (data  not  shown);  thus,  the  TQ  sequence  is  not  a  DNA-PK  phosphory¬ 
lation  site.  After  digestion  with  trypsin  and  CNBr,  none  of  the  phosphopeptides  were  retained 
on  a  Cl 8  reverse  phase  column,  but  after  digestion  with  trypsin  alone,  a  major,  modestly 
hydrophobic  phosphopeptide  was  retained  by  the  reverse  phase  column  (Figure  3).  Amino- 
terminal  sequence  analysis  showed  that  this  peptide  was  derived  from  the  amino  terminus 
of  the  expressed  protein  and  that  the  serine  at  position  3  was  partially  phosphorylated.  These 
findings  are  consistent  with  the  HPLC  data  shown  in  Figure  3;  after  digestion  with  CNBr 
and  trypsin,  serine  3  would  be  in  the  tripeptide  Ala-  Ser-Met*  (Met*  =  homoserine).  We  cannot 
formally  exclude  serine  22  as  a  possible  second  site  of  phosphorylation;  after  digestion  with 
both  CNBr  and  trypsin,  it  would  be  present  in  the  tetrapeptide  Ser-Gly-Gly-Met*.  However, 
another  Oct- 1 /POU  derivative  containing  serine  22  but  lacking  the  serine  3  site  was  not 
phosphorylated  by  DNA-PK.  Serine  3  lies  outside  the  POU  domain  and  is  not  followed  by 
a  glutamine,  but,  in  this  case,  the  non-specific  DNA  binding  ability  of  the  POU  domain  may 
be  sufficient  to  drive  phosphorylation  of  this  non-physiological,  vector-encoded  site.  Dis¬ 
tinguishing  between  physiological  and  non-physiological  substrates  and  sites  may  well  be 
even  more  difficult  for  DNA-PK  than  for  other  protein  kinases. 

Our  results  with  the  POU  domain  protein  raise  the  issue  of  whether  in  vivo  DNA-PK 
also  phosphorylates  serines  or  threonines  that  are  not  followed  by  glutamine.  One  such 
putative  target  is  the  carboxy-terminal  domain  (CTD)  of  the  large  subunit  of  RNA  polym¬ 
erase  II.  The  CTD  consists  of  about  50  conserved  repetitions  of  the  consensus  heptad  motif 
YSPTSPS,  and,  although  the  CTD  contains  no  SQ  or  TQ  sites,  it  is  phosphorylated  efficiently 
by  DNA-PK  when  coupled  to  a  GAL4  DNA  binding  domain  (Peterson  et  al.,  1992).  The 


Figure  3.  Reverse  phase  fractionation  of  phos¬ 
phopeptides  from  the  Oct- 1  POU  domain  polypep¬ 
tide  after  phosphorylation  by  human  DNA-PK. 
Purified  Oct- 1 /POU-domain  construct  (T7HPOU1, 
see  Figure  3)  was  incubated  with  DNA-PK,  calf 
thymus  DNA,  and  [^^P]ATP  as  described  (Ander¬ 
son  and  Lees-Miller,  1989b),  digested  with  trypsin 
and  CNBr  (solid  line)  or  with  trypsin  alone  (dashed 
line),  and  the  resulting  peptides  were  fractionated 
by  reverse  phase  (Cl 8)  HPLC  essentially  as  de¬ 
scribed  (Chen  et  al.,  1991). 
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Figure  4.  Sequence  analysis  of  an  RNA  polymerase  II CTD  peptide  phosphorylated  by  human  DNA-PK.  Top: 
The  RNA  polymerase  II  CTD  peptide  (YSPTSPS)4RRR,  containing  4  heptad  repeats,  was  labeled  by  incubation 
at  1  mM  with  purified  human  DNA-PK,  calf  thymus  DNA,  and  [^“P]ATP  as  described  (lees-Miller  et  al.,  1992). 
After  desalting  on  BioGel  P4,  the  labeled  peptide  (100,000  cpm)  was  applied  to  a  Beckman  890M  sequencer 
together  with  polybrene  (3  mg)  and  apomyoglobin  (0. 1  mg).  The  radioactivity  released  after  each  Edman  cycle 
is  shown;  the  peptide  sequence  is  given  at  the  top  of  the  figure.  The  major  sites  of  phosphorylation  were  serines 
7,  and  14  (*).  A  substantial  lag  in  the  release  of  phosphate  occurs  with  this  method  of  sequence  analysis. 
Bottom:  The  CTD  peptide  was  phosphorylated  as  described  above,  and  the  phosphorylated  peptide  was 
enriched  by  iron-affinity  chromatography  (Lees-Miller  and  Anderson,  1989a).  Before  sequence  analysis, 
phosphoserine  was  converted  to  S-ethylcysteine  (SEC)  by  incubation  with  ethanethiol  in  0.1  M  barium 
hydroxide  (Lees-Miller  et  al,  1989a).  Shown  are  limited  regions  of  the  PTH-amino  acid  chromatogram  near 
the  diphenylthiourea  (DPTU)  peak  from  sequencer  cycles  1,  2,  5,  7,  and  9;  SEC  elutes  immediately  before 
DPTU.  An  Applied  Biosystems  470 A  protein  sequencer  equipped  for  on-line  PTH  detection  was  used  for  this 
analysis. 


CTD  also  becomes  hyperphosphorylated  in  vivo  during  the  initiation  of  transcription,  and 
several  putative  CTD  kinases  have  been  described  (Dhamus  and  Dynan,  1992;  O’Brien  et 
aL,  1994). 

A  CTD  peptide  consisting  of  four  heptad  repeats,  (YSPTSPS)4RRR,  is  phosphory¬ 
lated  by  purified  human  DNA-PK,  although  the  Km  for  this  peptide  substrate  is  above  1 
mM.  The  CTD  peptide  was  phosphorylated  exclusively  on  serine  (data  not  shown),  and 
sequence  analysis  (Figure  4)  revealed  that  the  major  site  of  phosphorylation  was  the  serine 
at  position  7,  or  the  equivalent  serine  in  the  second  (and  probably  at  the  third  and  fourth) 
repeat.  These  serines  are  followed  in  the  repeat  by  tyrosine;  however,  tyrosine  did  not 
efficiently  substitute  for  glutamine  in  the  p53 -derived  sequence  (Table  2).  The  GAL4-CTD 
protein  was  reported  to  be  phosphorylated  at  both  serine  and  threonine  (Dvir  et  al.,  1993); 
thus,  the  number  of  repeats,  the  DNA  template,  or  other  factors  may  influence  the  recognition 
of  sites  by  DNA-PK. 
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Another  consideration  with  regard  to  substrate  recognition  is  that  some  sites  phos- 
phorylated  in  vivo  may  not  be  phosphorylated  in  vitro.  We  alluded  to  a  possible  case  in  SV40 
TAg.  Serine  639  is  phosphorylated  in  vivo  and  resembles  other  DNA-PK  sites,  but  no  kinase 
has  been  identified  that  phosphorylates  this  site  in  vitro  (Chen  et  ah,  1991).  In  vivo  TAg 
assembles  into  a  dodecameric  structure  at  the  SV40  origin  of  replication,  whereas,  in  our  in 
vitro  experiments,  calf  thymus  DNA  was  used  as  the  DNA-PK  activator,  and  TAg  most  likely 
binds  in  a  non-sequence  specific  manner  to  fragments  of  calf  thymus  DNA  primarily  as 
monomers.  It  remains  to  be  determined  if  the  pattern  of  TAg  residues  phosphorylated  by 
DNA-PK  would  be  different  if  the  Tag  had  been  assembled  on  replication  origins. 


CONCLUSIONS 

Most  of  the  DNA-PK  in  normal,  unsynchronized,  cultured  human  fibroblasts  and 
lymphoid  cells  is  not  tightly  associated  with  DNA;  thus,  presumably  it  is  inactive  (Anderson 
and  Lees-Miller,  1 992).  In  vitro  assays  indicated  that  total  cellular  DNA-PK  activity  changes 
only  slightly  with  stage  of  the  cell  cycle,  and  no  significant  increase  in  activity  was  observed 
after  HeLa  cells  were  treated  with  DNA  damage-inducing  agents.  These  observations  suggest 
that  DNA-PK  normally  may  be  inactive  and  that  synthesis  and  degradation  are  not  important 
mechanisms  of  regulating  its  activity.  We  presume  that  DNA-PK  activation  occurs  as  a 
consequence  of  changes  to  chromatin  that  create  entry  sites  for  the  Ku  DNA-binding, 
regulatory  subunit.  Such  changes  may  be  caused  by  normal  cellular  activities  including 
transcription,  replication  or  recombination,  or  they  may  be  a  consequence  of  damage  to  DNA 
that  is  caused  by  endogenous  or  exogenous  agents.  Further  regulation  may  occur  through 
autophosphorylation  (Lees-Miller  et  ah,  1990),  other  posttranslational  events,  or  through  the 
synthesis  or  activation  of  inhibitors  (An  et  al.,  1994).  Because  the  methods  used  to  disrupt 
cells  in  preparing  extracts  inevitably  fragments  chromatin,  it  has  not  been  possible  to 
determine  the  state  of  activation  of  DNA-PK  with  existing  in  vitro  assays.  Furthermore, 
because  decay  of  the  ^^P-label  required  to  detect  substrate  phosphorylation  in  vivo  also 
produces  DNA  strand  breaks  that  may  activate  DNA-PK,  it  has  not  been  possible  to  correlate 
changes  in  the  phosphorylation  of  putative  substrates  with  kinase  activation.  Thus,  a  major 
challenge  will  be  to  develop  assays  suitable  for  measuring  DNA-PK  activity  and  activation 
in  vivo.  A  detailed  knowledge  of  the  factors  that  affect  DNA-PK  substrate  recognition  in 
vitro  undoubtedly  will  be  helpful  for  developing  improved  assays  for  DNA-PK  activity. 
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INTRODUCTION 

In  response  to  damaged  DNA,  mammalian  cell  growth  is  arrested  at  cell  cycle 
checkpoints  in  G 1 ,  near  the  border  of  S  phase,  or  in  G2,  before  mitosis  (Murray,  1 992;  Hunter, 
1993;  Weinert  and  Lydall,  1993),  In  some  circumstances,  DNA  damage  initiates  apoptosis, 
a  program  that  results  in  cell  death.  Recent  studies  have  shown  that  the  p53  tumor  suppressor 
protein  is  an  essential  component  of  the  G1  checkpoint  pathway  (Kastan  et  ah,  1991);  it  also 
modulates  the  initiation  of  apoptosis  (Oren,  1994).  The  arrest  of  cell  cycle  progression 
provides  time  for  DNA  damage  to  be  repaired,  whereas  apoptosis  may  insure  the  death  of 
more  severely  damaged  cells  that  are  at  risk  of  loss  of  growth  control  through  genome 
rearrangements.  Thus,  these  functions  account,  at  least  in  part,  for  the  importance  of  p53  in 
suppressing  or  eliminating  preneoplastic  or  neoplastic  cells  in  the  human  and  other  vertebrate 
species.  In  turn,  p53  function  is  mediated  through  its  physical  characteristics,  and  these  may 
be  modulated  by  post-translational  mechanisms  (Ullrich  et  ah,  1992;  Meek,  1994).  Thus, 
biophysical  studies  of  p53  and  its  functional  domains  are  fundamental  to  an  understanding 
of  those  properties  that  are  important  for  normal  p53  function. 

p53  is  a  nuclear  phosphoprotein  with  a  short  half-life,  and  its  normal  concentration 
in  the  nucleus  is  low  (Levine,  1993).  However,  p53  protein  accumulates  in  the  nuclei  of  cells 
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exposed  to  UV  irradiation,  ionizing  radiation,  and  other  DNA  damage-inducing  agents 
(Maltzman  and  Czyzyk,  1984;  Kuerbitz  et  al,  1992;  Lu  and  Lane,  1993;  Nelson  and  Kastan, 
1994),  and  this  accumulation  is  thought  to  result  from  a  transient  stabilization  of  the  p53 
protein  through  undefined  post-transcriptional  mechanisms.  Overexpression  of  wild-type 
p53  protein  through  transient  transfection  or  stable  integration  of  a  wild-type  p53  cDNA 
driven  by  an  inducible  promoter  was  shown  to  arrest  cell  growth  at  or  near  the  Gl/S  border 
(Mercer  et  al.,  1990).  However,  cell  cycle  arrest  did  not  occur  when  mutant  p53s  were 
overexpressed,  nor  did  it  occur  in  tumor  cells  without  a  functional  p53  gene  after  they  were 
exposed  to  DNA  damage-inducing  agents  (Kastan  et  al.,  1991). 

p53  is  a  sequence  specific  activator  of  transcription.  When  the  wild-type  p53  protein 
was  fused  to  the  yeast  GAL4  DNA  binding  domain,  it  activated  transcription  from  GAL4 
reporter  plasmids  in  both  yeast  and  mammalian  cells  (Fields  and  Jang,  1990;  Raycroft  et  al., 
1990).  The  fragment  containing  the  first  73  residues  of  p53,  which  are  rich  in  acidic  amino 
acids,  was  sufficient  to  confer  transcriptional  activation  with  the  GAL4  domain.  Thus,  it  was 
surprising  that  several  missense  mutants,  with  lesions  in  the  central  domain,  failed  to 
transactivate  when  coupled  to  GAL4  (Unger  et  al.,  1992).  Subsequently,  wild-type  p53  was 
shown  to  bind  to  a  specific  consensus  DNA  sequence  consisting  of  two  copies  of  the  1 0 
base-pair  motif:  5’-PuPuPuC(A/T)(T/A)GPyPyPy-3’  (Funk  et  al.,  1992;  El-Deiry  et  al., 
1992;  Halazonites  et  al.,  1993).  Each  10  base-pair  element  is  palindromic,  with  two  five 
base-pair  motifs  in  opposite  orientations.  p53  DNA  binding  sites  have  been  identified  in  the 
5’  regions  of  several  natural  genes,  including  WAFl/CIPl,  MDM2,  GADD45,  and  MCK 
(muscle  creatine  kinase),  and  in  these  natural  sites,  the  10  base-pair  elements  are  separated 
by  0  to  13  base-pairs  (Bargonetti  et  al.,  1991;  Zambetti  et  al.,  1992;  Kastan  et  al.,  1992; 
El-Deiry  et  al.,  1993).  These  natural  p53  binding  elements  confer  p53  inducibility  to  their 
associated  transcription  units  and,  when  placed  upstream,  to  reporter  genes  in  vivo  and  in 
vitro  (Farmer  et  al.,  1992;  Funk  et  al.,  1992;  Kern  et  al.,  1992;  Zambetti  et  al.,  1992). 
WAFl/CIPl  encodes  a  2 1  kDa  inhibitor  of  several  cyclin-dependent  kinases  that  are  required 
for  entry  into  S  phase  (El-Deiry  et  al.,  1993,  1994;  Xiong  et  al.,  1993).  Thus,  induction  of 
Wafl  synthesis  in  response  to  elevated  levels  of  nuclear  p53  provides  a  mechanism  that 
accounts  for  the  arrest  of  cell  cycle  progression  after  DNA  damage  or  after  the  engineered 
overexpression  of  p53  protein  in  cultured  cells. 

Two  approaches,  partial  proteolysis  and  the  expression  of  recombinant  deletion 
constructs,  were  used  to  demonstrate  that  the  central,  highly  conserved  domain  of  p53, 
encompassing  about  200  amino  acids  from  residue  100,  is  responsible  for  sequence-spe¬ 
cific  DNA  binding  (Pavletich  et  al.,  1993;  Bargonetti  et  al.,  1993;  Wang  et  al.,  1993; 
Halazonetis  and  Kandil,  1993).  This  domain  includes  four  of  the  five  most  highly 
conserved  segments  of  the  p53  protein  (Soussi  et  al,  1990);  most  p53  mutations  from 
human  tumors  affect  residues  in  this  region  (Hollstein  et  al.,  1991).  X-ray  diffraction 
analysis  of  co-crystals  of  the  central  core  DNA  binding  domain  and  a  double- stranded 
recognition  site  oligonucleotide  revealed  the  three  dimensional  structure  of  the  complex 
at  a  resolution  of  2.2  A  (Cho  et  al.,  1994).  The  core  DNA-binding  domain  consists  of 
two  anti-parallel  p  sheets,  of  four  and  five  strands,  respectively,  that  anchor  the  DNA 
binding  elements,  two  large  loops  and  a  loop-sheet-helix  motif,  on  one  face  of  the 
structure.  The  positions  of  the  two  loops  are  further  stabilized  by  a  tetrahedrally  coordi¬ 
nated  zinc  atom.  The  most  frequently  detected  mutations  in  human  tumors  cluster  in  the 
gene  segments  that  encode  the  large  loops  and  the  loop-sheet-helix  motif,  suggesting  that 
these  elements  are  essential  for  p53’s  role  as  a  tumor  suppressor. 

Wild-type  p53  protein  forms  tetramers  and  higher  order  oligomeric  structures  in 
solution  (Stenger  et  al.,  1 992),  and  the  domain  responsible  for  oligomerization  was  localized 
to  the  carboxyl  terminus  (Wang  et  al.,  1993).  p53  with  its  47  carboxy-terminal  amino  acids 
deleted  is  monomeric  (Milner  and  Medcalf,  1991).  Subsequent  studies  showed  that  amino 
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acids  315  to  360  were  sufficient  for  the  formation  of  recombinant  protein  tetramers  (Wang 
et  al.,  1993,  1994);  proteolytic  digestion  of  human  p53  yielded  a  53  residue  fragment,  which 
contained  residues  311  to  363  and  formed  tetramers  (Pavletich  et  al.,  1993).  Although  the 
role  of  tetramerization  in  p53  function  is  not  completely  understood,  it  is  required  for 
efficient  site-specific  DNA  binding  and  contributes  to  p53’s  ability  to  activate  transcription 
from  natural  promoters.  In  addition,  the  tetramerization  domain  of  mouse  p53,  residues  315 
to  360,  was  shown  to  be  sufficient  for  cooperation  with  an  activated  ras  oncogene  in  cell 
transformation  (Reed  et  al.,  1993).  Formation  of  heterotetramers  between  wild-type  and 
mutant  p53  in  vivo  may  inactivate  wild-type  p53  function,  thus  potentiating  tumor  develop¬ 
ment. 

Carboxy-terminal  p53  sequences  appear  to  have  an  inhibitory  effect  on  site-specific 
DNA  binding  (Hupp  et  al.,  1992).  Wild-type  p53  made  in  E.  coli  binds  relatively  poorly  to 
the  p53  consensus  recognition  sequence,  but  its  binding  is  enhanced  by  several  treatments 
that  affect  the  carboxyl  terminus.  These  include  deletion  of  the  30  carboxy-terminal  residues, 
complex  formation  with  antibodies  or  other  proteins  that  recognize  carboxy-terminal  se¬ 
quences,  and  phosphorylation  by  casein  kinase  II  (CKII).  Casein  kinase  II  phosphorylates 
the  penultimate  residues  of  mouse  and  human  p53,  Ser^^^  and  Ser^^^,  respectively  (Meek  et 
al.,  1990;  Meek,  1994).  The  last  27  residues  of  human  p53  are  encoded  by  a  separate  exon, 
suggesting  that  this  segment  may  comprise  a  distinct  functional  domain  (Lamb  and  Craw¬ 
ford,  1986).  The  carboxyl  terminus  is  rich  in  basic  residues  and  provides  non-sequence 
specific  DNA  binding  and  strand  annealing  activities  that  are  independent  of  the  sequence- 
specific  DNA  binding  central  domain  (Wang  et  al.,  1993;  Pavletich  et  al.,  1993;  Oberosler 
et  al.,  1993;  Bakalkin  et  al.,  1994). 

Wild-type  p53  is  phosphorylated  in  vivo  at  several  amino-terminal  and  carboxy-ter¬ 
minal  sites  (Ullrich  et  al.,  1993;  Meek,  1994),  and  several  of  these  sites  can  be  phosphory¬ 
lated  in  vitro  by  known  serine/threonine  protein  kinases.  Serine  389  and  392,  which  are 
homologous  residues  in  mouse  and  human  p53,  respectively,  are  phosphorylated  in  vivo, 
and,  as  noted  above,  in  vitro  by  CKII.  Serine  312  (murine)  and  serine  315  (human)  also  are 
phosphorylated  in  vivo,  and  these  residues  can  be  phosphorylated  in  vitro  by  the  p34‘^^‘^^ 
cyclin  dependent  kinase  (Addison  et  al.,  1990;  Bischoff  et  al.,  1990).  Several  sites  in  the 
amino-terminal  region  are  phosphorylated;  these  include  serines  4,  6,  9,  15,  and  32,  and 
threonines  78  and  88  in  murine  p53,  and  serines  9,  15,  and  33  in  human  p53  (Wang  and 
Eckhart,  1992;  Lees-Miller  et  al.,  1992;  Ullrich  etal.,  1993;  Milne  et  al,  1994;  Meek,  1994). 
In  murine  p53,  serines  4, 6,  and  9  are  phosphorylated  in  vitro  by  a  casein  kinase  I-like  enzyme 
(Milne  et  al.,  1 992),  serines  4  and  1 5  can  be  phosphorylated  by  DNA-activated  protein  kinase 
(Wang  and  Eckhart,  1992;  Lees-Miller  et  al.,  1992),  and  threonines  78  and  88  are  phospho¬ 
rylated  by  MAP  kinase  (Milne  et  al.,  1994).  Serine  15  of  human  p53  is  phosphorylated  by 
the  DNA-activated  protein  kinase  (Lees-Miller  et  al.,  1992),  and  serine  33  may  be  phospho¬ 
rylated  by  MAPK  or  JNKl,  the  Jun  N-terminal  kinase  (De’rijard  et  al.,  1994).  Phosphory¬ 
lation  is  an  important  mechanism  for  regulating  the  activity  of  several  transcription  factors, 
including  Jun,  NF-kB,  and  SRF  (Jackson,  1992;  Hunter  and  Karin,  1992),  and  the  fact  that 
the  amino  acid  sequences  at  the  five  major  sites  phosphorylated  in  human  p53  are  conserved 
in  most  vertebrates  (Soussi  et  al.,  1990)  suggests  that  phosphorylation  plays  an  important 
role  in  regulating  p53  function.  However,  mutant  p53s  with  nonphosphorylatable  residues 
at  individual  phosphorylation  sites  are  similar  to  wild-type  p53  in  the  properties  that  have 
been  examined.  Even  p53  with  a  mutant  CKII  site  was  found  to  be  indistinguishable  from 
wild-type  p53  with  respect  to  the  in  vivo  functions  of  transactivation,  growth  arrest,  and 
suppression  of  cell  transformation  by  ras  and  El  A  (Fiscella  et  al.,  1994).  Thus,  to  date,  no 
clear  role  for  p53  phosphorylation  has  emerged.  The  basic  structure  of  human  p53  protein 
is  summarized  in  figure  1 . 
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Transactivation  Sequence  specific  DNA  binding  Tetrameiization 

1  -  -  -  -  I _ I  [  Pu-Pu-Pu-C-(Arr)-a/A)-G*Py-Py-Py  ]  1+  I 

Acidic  Ala,  Pro-rich  Basic 

I _ I 

Non-specific  DNA  binding 


Figure  1.  Schematic  representation  of  the  393  amino  acid  human  p53  polypeptide  illustrating  properties  of  its 
domains.  The  amino  terminus  of  p53  contains  an  acidic  transactivation  domain  (residues  1-73)  followed  by 
an  alanine-rich  segment.  Residues  1 02  to  286  contain  the  site-specific  DNA  binding  domain.  Carboxy-terminal 
residues  319-360  provide  for  tetramerization,  and  this  region  is  followed  by  a  segment  rich  in  basic  residues; 
the  carboxy-terminal  segment  provides  non-specific  DNA  binding  and  strand  annealing  functions.  At  least  five 
human  p53  sites,  serines  9,  15,  33,  315,  and  392  are  phosphorylated;  kinases  that  phosphorylate  three  of  these 
sites  in  vitro  have  been  identified.  The  five  segments  that  are  evolutionarily  highly  conserved  are  indicated. 
References  are  given  in  the  text. 


Structure  of  the  Tetramerization  Domain  of  Human  p53 

To  investigate  the  structure  of  the  region  responsible  for  tetramer  formation,  nine 
peptides,  corresponding  to  carboxy-terminal  segments  of  human  p53  were  chemically 
synthesized  by  solid  phase  methods  using  Fmoc  chemistry.  The  larger  peptides,  S303-D393, 
K319-D393,  R335-D393,  and  K319-D393(A^^^’^^^’^^^)  were  synthesized  by  a  segment 
condensation  method  using  peptide  thioesters  (Hojo  and  Aimoto,  1992).  Each  peptide  was 
studied  by  equilibrium  analytical  ultracentrifugation  using  a  Beckman  XL-A  analytical 
ultracentrifuge  and  analyzing  the  data  by  fitting  to  appropriate  mathematical  models  using 
MLAB  (Sakamoto  et  al.  1994).  Figure  2  shows  the  concentration  distribution  of  p53(319- 
360)  at  equilibrium  fit  as  a  monomer,  as  a  monomer-dimer  equilibrium,  and  as  a  monomer- 
tetramer  equilibrium.  The  equilibrium  concentrations  of  peptides  S303-D393,  K319-D393 
and  K3 19-G360  were  entirely  consistent  with  the  formation  of  peptide  tetramers  in  equilib¬ 
rium  with  monomers.  Residues  between  K3 19  and  G334  are  essential  for  tetramer  formation 
since  peptide  R335-D393  was  observed  only  as  a  monomer,  as  was  the  variant  peptide, 
K3 1 9-D393(A^^^’^^”^’^^^),  in  which  three  hydrophobic  amino  acids,  Leu^^^,  Tyr^^^,  and  Leu^^^, 
were  changed  to  alanine.  Deletion  of  the  33  carboxy-terminal  residues  of  p53,  G361-D393, 
had  no  effect  on  tetramer  formation,  but  deletion  of  another  nine  residues,  to  K35 1 ,  reduced 
the  association  constant  for  tetramer  formation,  and  deletion  of  another  four  residues,  to 
A347,  allowed  only  a  minimal  association  to  form  tetramers.  Thus,  the  segment  from 
residues  K319  to  (3360  forms  a  core  domain  that  exhibits  a  strong  propensity  to  form 
tetramers;  no  evidence  was  seen  for  dimer  formation. 

The  thermodynamic  parameters  describing  p53  tetramerization  were  obtained  by  the 
method  of  Clarke  and  Glew  (1966).  This  procedure  involves  plotting  R  as  a  function 
of  absolute  temperature  and  fitting  the  data  with  a  mathematical  model  derived  from  basic 
thermodynamic  principles.  An  examination  of  the  thermodynamic  characteristics  of  p53 
peptides  containing  the  tetramerization  domain  was  revealing  (Figure  3).  A  comparison  of 
peptides  S303-D393  and  K3 19-D393  indicated  that  the  increased  propensity  of  K3 19-D393 
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Figure  2.  Concentration  distribution  of  p53  (319-360),  expressed  as  absorbance  at  231  nm,  as  a  function  of 
radial  position  in  the  ultracentrifuge  cell  at  equilibrium  at  32000  rpm  and  25°C.  The  fitting  lines  are  best  fits 
for  the  following  models  of  association:  (1)  a  monomer-tetramer  equilibrium,  (2)  a  monomer-dimer  equilib¬ 
rium,  and  (3)  monomer  alone.  The  failure  of  the  latter  two  models  is  readily  apparent.  When  a  monomer-di- 
mer-tetramer  equilibrium  model  was  used  for  fitting,  a  zero  value  for  K]2  was  obtained,  indicating  that  dimer 
was  not  present  in  detectable  quantities. 


to  form  tetramers  (smaller  dissociation  constant,  Kp,  Figure  3)  is  a  consequence  of  an 
increase  in  the  magnitude  of  the  change  of  the  standard  free  energy  (AG")  of  0.6  kcal  mol'h 
A  similar  comparison  of  S303-G360  with  K319-G360  revealed  an  even  greater  increase  in 
AG"  with  removal  of  the  amino-terminal  segment,  this  time  by  1 .3  kcal  mol'h  Thus,  the  amino 
acid  segments  on  both  the  carboxyl-  and  amino-terminal  sides  of  the  tetramerization  domain 
adversely  affect  the  ability  of  the  core  domain  to  form  tetramers.  Comparable  data  for  the 
intact  p53  protein  currently  are  not  available,  but  a  comparison  of  its  thermodynamic 
parameters  with  those  of  the  model  peptides  would  be  of  interest,  as  would  be  the  comparison 
of  these  parameters  from  unmodified  and  post-translationally  modified  p53  or  p53  peptides. 
As  noted  above,  serines  315  and  392  are  modified  by  phosphorylation,  and  one  or  more 
additional  residues  in  the  carboxy-terminal  region  may  be  modified  by  phosphorylation  or 
glycosylation. 

Circular  dichroism  (CD)  was  used  to  investigate  the  secondary  structure  of  the  p53 
peptides.  From  the  CD  spectrum,  it  is  possible  to  estimate  the  fraction  of  the  residues  that 
form  a  helices  or  p  sheets;  in  particular,  a  helices  are  associated  with  a  large  negative 
ellipticity  at  222  nm  (Johnson,  Jr.,  1988).  All  of  the  peptides  that  failed  to  form  tetramers, 
e.g.  K319-D393(A^^^’^^^’^^°),  R335-D393  (Figure  3),  also  had  small  ellipticity  values  (8222 
>  -2000).  In  contrast,  K319-G360  and  K319-D393,  both  of  which  are  nearly  90%  tetramer 
at  the  concentrations  used  for  the  CD  measurements,  had  high  negative  ellipticity  values 
(0222  <  -5000).  The  ellipticity  of  K319-G360,  O222  =  -9580,  was  greater  that  that  of 
K319-D393,  which  implies  that  the  carboxy-terminal  segment  distal  to  the  tetramerization 
domain  has  a  low  helical  content.  However,  the  core  tetramerization  domain  cannot  be  a 
typical  coiled-coil  four  helical  bundle  since  the  ellipticity  of  this  segment  is  not  indicative 
of  the  required  a  helical  content.  Substitution  of  three  hydrophobic  amino  acids  with  alanine 
within  the  core  domain  prevented  tetramer  formation  and  the  acquisition  of  an  a  helical 
conformation.  Thus,  these  residues  are  critical  for  forming  the  correct  secondary  structure 
that,  in  turn,  is  necessary  for  oligomerization.  Interestingly,  these  residues,  Leu^^^,  Tyr^^^, 
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Figure  4.  Schematic  ribbon  drawing  of  the  struc¬ 
ture  of  the  tetramerization  domain  of  human  p53, 
residues  319-360.  The  tetramer  is  represented  as 
a  dimer  of  dimers  with  the  dimers  at  approxi¬ 
mately  right  angles  to  each  other.  This  view  is  an 
expanded  representation  of  the  tetramerization 
model  shown  in  figure  5. 


and  Leu^^^,  lie  outside  of  the  segment,  residues  331-360,  most  likely  to  adopt  an  a  helical 
structure. 

Recently,  the  three  dimensional  structure  of  the  core  tetramerization  domain  was 
determined  using  multidimensional  NMR  spectroscopy  (Clore  et  al,  1994).  NMR  spectros¬ 
copy  was  carried  out  on  unlabeled  samples,  on  peptides  uniformly  labeled  with  and 

on  mixed  tetramers  comprised  of  equal  amounts  of  the  unlabeled  and  isotopically  labeled 
peptides.  Inter-proton  distance  constraints  were  derived  from  isotope-edited  and  filtered  3D 
and  4D  NOE  spectra  (Bax  et  al.,  1 993 ;  Clore  and  Gronenbom,  1 994).  With  this  methodology, 
it  is  possible  to  determine  the  positions  of  amino  acid  residues  within  each  subunit  as  well 
as  the  relative  positions  of  the  different  subunits  within  the  tetramer.  Figure  4  shows  a 
schematic  ribbon  drawing  of  the  structure  of  the  tetramerization  domain  as  determined  by 
NMR.  Each  42  amino  acid  peptide  monomer  consists  of  a  short  turn  (D324-G325)  followed 
by  a  strand  (E326-G334),  another  short  turn  (R335-E336),  and  an  a  helix  (R337-A355).  The 
first  residues  at  the  amino  terminus,  K319-L323,  and  the  four  carboxy-terminal  residues, 
K357-G360,  were  disordered  in  solution.  Two  monomer  peptides  form  a  dimer  in  which  the 
a  helices  and  P  strands  are  antiparallel  and  interact.  Two  dimers  interact  to  form  a  tetramer 
through  their  a  helices;  in  this  case  the  interacting  helices  are  aligned  in  a  antiparallel  manner. 
The  p  strands  lie  on  the  outside  of  the  tetramer  on  opposite  faces.  Thus,  the  tetramer  is  best 
described  as  a  dimer  of  dimers,  with  the  two  dimers  interacting  through  their  a  helices,  which 
form  a  four  helix  bundle.  In  view  of  the  fact  that  we  have  been  unable  to  observe  the  presence 
of  dimers  in  the  ultracentrifugal  analysis,  it  is  implicit  that  the  association  constant  for 
tetramer  formation  from  dimers  is  much  greater  than  the  association  constant  for  dimer 
formation  from  monomers. 

The  crucial  structural  element  for  tetramer  formation  appears  to  reside  in  the  p-strand 
structures  of  corresponding  dimers  as  is  shown  by  the  number  of  NOEs  between  the  p-strands 
as  compared  to  the  helices  of  the  adjacent  subunits.  A  recombinant  p53  protein  has  been 
produced  in  which  the  carboxy  terminus  of  p53  distal  to  residue  333,  including  the  turn  and 
a  helical  region  of  the  tetramerization  domain,  was  replaced  by  a  leucine  zipper  dimerization 
domain  of  yeast  transcription  factor  GCN4  (Pietenpol  et  al,  1994).  This  hybrid  p53  protein 
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cannot  form  tetramers,  suggesting  that  the  small  loop  connecting  the  p  strand  to  the  helical 
segment  also  is  important  for  tetramer  formation.  In  addition  to  hydrophobic  interactions 
between  helices  in  the  four  helix  bundles,  several  salt  bridges  contribute  to  stabilization  of 
dimer-dimer  interactions  in  the  tetramers.  Most  p53  mutations  from  human  tumors  are 
located  in  regions  encoding  the  sequence-specific  DNA  binding  domain;  however,  four, 
which  cause  the  substitution  of  His  for  Leu^^^,  Val  for  Gly^^^,  Cys  for  Arg^^^,  and  Asp  for 
Glu^"^^,  lie  within  sequences  encoding  the  oligomerization  domain  (Greenblatt  et  al.,  1994). 
These  residues  are  located  at  the  interface  of  the  dimers;  thus,  they  may  partially  disrupt 
interactions  which  are  required  for  tetramer  formation. 

The  specific  dimer-of-dimers  topology  of  the  p53  tetramerization  domain  has  not 
been  observed  previously  in  other  tetrameric  protein  structures.  This  topology  places  some 
constraints  on  the  relative  locations  of  the  four  amino-terminal  segments  of  the  p53  tetramer, 
which  includes  the  central  site-specific  DNA  binding  domain  containing  residues  T102  to 
K292  (Cho  et  al.,  1994).  The  topology  suggests  that  DNA-binding  domains  of  adjacent 
subunits  may  be  in  closer  proximity  than  the  DNA  binding  domains  for  non-adjacent 
subunits,  thus  permitting  a  close  apposition  in  the  dimer  of  the  sequence  specific  DNA 
binding  domain  with  the  carboxy-terminal,  basic,  non-sequence  specific  DNA  binding  tail 
segment  (residues  361-393)  (Figure  5).  This  apposition  may  explain  how  alterations  to  the 
carboxy  terminus  of  p53,  including  phosphorylation  of  serine  292  and  deletion  of  the 
carboxy-terminal  30  residues,  affect  sequence-specific  DNA  binding  (Hupp  et  al.,  1992). 
Based  on  this  predicted  structural  arrangement  (Figure  5),  we  envision  each  monomer  of 
adjacent  dimer  subunits  (AC)  contacting  one-quarter  of  the  consensus  sites  in  one  10 
base-pair  palindrome  while  the  other  dimer  (BD)  contacts  equivalent  sites  in  the  second  1 0 
base-pair  palindrome.  This  disposition  of  binding  site  interactions  might  be  expected  to 
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Figure  5.  A.  Primary  structure  of  the  human  p53  protein  showing  the  location  of  the  various  domains  along 
the  polypeptide  chain.  B.  Model  of  the  tetrameric  arrangement  of  the  human  p53  protein  illustrating  the 
proposed  positions  of  the  DNA-binding  domains  with  respect  to  the  tetramerization  domains. 
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induce  a  sizable  bend  in  the  DNA.  No  such  bending  is  observed  in  the  co-crystal  structure 
reported  by  Cho  et  al  (1994);  however,  in  this  complex,  only  two  binding  domain  monomers 
were  associated  with  consensus  site  oligonucleotide,  and  the  tetramerization  domain  was  not 
present. 


CONCLUSIONS 

Sedimentation  equilibrium,  a  quantitative  physical  technique,  was  used  to  show  that 
peptides  derived  from  the  carboxy-terminal  oligomerization  region  of  human  p53  form  stable 
tetramers  that  are  in  equilibrium  with  peptide  monomers  in  aqueous  solution.  Although  less 
commonly  used  today  than  gel  electrophoretic  techniques  (Stenger  et  al.,  1 994),  the  behavior 
of  interacting  components  during  sedimentation  equilibrium  is  rigorously  based  on  revers¬ 
ible  thermodynamics.  Thus,  the  quantitative  measurements  that  are  readily  made  with 
modern  instrumentation  can  be  meaningfully  interpreted  in  terms  of  association  constants 
and  changes  in  the  free  energy  of  association.  Using  this  technique  in  conjunction  with 
modern  methods  for  the  efficient  synthesis  of  longer  peptides,  we  have  defined  the  optimal 
segment  of  p53  for  the  formation  of  tetramers.  CD,  an  optical  technique  for  estimating 
secondary  structure  content,  was  then  used  to  predict  the  secondary  structure  of  the  tetra¬ 
merization  domain.  Subsequently,  the  three  dimensional  structure  of  the  tetramerization 
domain  was  determined  by  high  resolution  multidimensional  NMR  techniques.  This  struc¬ 
tural  determination  not  only  confirmed  that  the  segment  of  p53  between  residues  K3 19  and 
G360  forms  tetramers,  but  also  that  both  the  amino-terminal  P  sheet  element,  Y327-G334, 
and  the  carboxy-terminal  a  helical  region,  R337-A355,  contribute  to  the  structure.  The 
structure  is  best  described  as  a  dimer-of-dimers,  with  the  opposing  dimers  held  together 
principally  through  interactions  between  their  a  helices. 

The  topology  of  the  tetramerization  domain  has  important  implications  relating  to 
the  biological  properties  of  p53.  Some  constraints  are  placed  on  relative  positions  of  the 
sequence  specific  DNA  binding  domain  of  each  monomer  within  the  tetramer.  The  DNA 
binding  domain  is  amino-terminal  to  the  tetramerization  domain  and  is  connected  to  it  by  a 
linker  of  about  30  residues  that  may  be  flexible.  The  basic  tail  of  p53,  residues  G361  through 
the  carboxy  terminus,  modulates  sequence  specific  DNA  binding.  The  topology  of  the 
tetramerization  domain  suggests  that  this  tail,  which  in  conjunction  with  the  tetramerization 
domain  has  non-specific  DNA  binding  properties,  may  modulate  the  conformation  of  the 
site-specific  DNA  binding  domain  within  adjacent  dimers,  thus  preventing  the  latter  domain 
from  adopting  an  optimal  structure  for  sequence-specific  DNA  recognition.  Although  the 
tetramerization  domain  facilitates  the  interaction  of  p53  with  DNA,  it  may  also  place  the 
carboxy-terminal  tail  in  position  to  regulate  p53  binding  to  its  consensus  sites.  The  juxtapo¬ 
sition  of  DNA  binding  domains  in  the  tetramer  also  suggests  that  p53  may  bend  DNA.  Strong 
bending  of  DNA  has  been  observed  for  other  transcription  factor  DNA-complexes  and  may 
be  a  property  that  is  important  for  the  activation  of  transcription  (Harrington  and  Winicov, 
1994).  Thus,  the  arrangement  of  domains,  provided  by  the  topology  of  the  dimer-of-dimers 
tetramerization  element,  may  provide  an  adaptable  DNA  reading  mechanism  that  expedites 
specific  interactions  with  p53  response  elements  in  response  to  DNA  damage.  The  specula¬ 
tions  raised  here  undoubtedly  will  be  tested  in  the  near  future  when  new  high  resolution  p53 
structures  become  available.  A  precise  understanding  of  the  relationship  of  the  domains  in 
the  p53  tetramer  may  suggest  ways  to  manipulate  the  interaction  of  wild-type  and  mutant 
p53s  with  each  other  and  with  DNA.  The  discovery  of  drugs  that  circumvent  the  loss  of  p53 
function  is  an  exciting  prospect  that  would  have  wide  implications  for  cancer  treatment. 
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INTRODUCTION 

Mammalian  alcohol  dehydrogenases  (ADH)  constitute  a  well-studied  enzyme  system 
composed  of  sub-forms  at  different  levels  of  multiplicity.  The  family  has  diverged  into  a 
number  of  different  enzymes.  At  the  next  level  (Fig.  1),  fairly  different  forms  (“classes”)  of 
alcohol  dehydrogenase,  with  distinct  structural  and  enzymatic  properties,  occur.  The  sub¬ 
sequent  level  constitutes  still  more  similar  forms  (“isozymes”)  with  gradual  differences  in 
properties  and  fewer  residue  exchanges. 

At  the  class  level,  five  different  classes  have  thus  far  been  characterized  in  humans. 
Considering  also  other  mammals,  the  number  of  classes  in  mammalian  alcohol  dehydro¬ 
genases  appears  to  be  minimally  six  (Jomvall  and  H66g,  1995).  Knowledge  is  by  far  most 
extensive  for  the  class  I,  III,  and  IV  enzymes,  class  I  being  the  classical  liver  alcohol 
dehydrogenase  with  considerable  ethanol  dehydrogenase  activity,  class  III  the  ubiquitous 
glutathione-dependent  formaldehyde  dehydrogenase,  and  class  IV  the  stomach  enzyme  with 
high  ethanol  and  retinol  dehydrogenase  activity  (Vallee  and  Bazzone,  1983;  Koivusalo  et 
al,  1989;  Yang  et  al.,  1993;  Pares  et  al.,  1994). 


CLASSES:  DISTINCT  PROPERTIES 

Distinct  properties  of  the  classes,  forming  more  or  less  separate  enzymes,  are 
summarized  in  Table  1  regarding  the  three  classes  thus  far  best  characterized,  I,  III,  and  IV 
The  distinctions  concern  both  functional  and  structural  properties,  as  well  as  molecular 
building  units,  origins  and  expressions.  Class  I  is  the  classical  liver  alcohol  dehydrogenase, 
a  major  enzyme  in  metabolism  of  ingested  ethanol,  for  which  methylpyrazole  is  a  strong 
competitive  inhibitor.  Class  III  is  identical  to  glutathione-dependent  formaldehyde  dehydro¬ 
genase,  has  a  limited  ethanol  dehydrogenase  activity  only  detectable  at  high  substrate 
concentrations  and  with  virtually  no  sensitivity  to  methylpyrazole.  Class  IV  has  a  limited 


Methods  in  Protein  Structure  Analysis,  Edited  by  M.  Z.  Atassi  and  E.  Appella 
Plenum  Press,  New  York,  1995 


419 


420 


L.  Hjelmqvist  et  al. 


Family 


Enzymes 


Classes 


Isozymes 


Medium-chain 
line  (MDR) 


Dehydro¬ 

genases 


/  / 


Reductases 


a  b  E  SaPy//2 


Figure  1.  Different  levels  of  multiplicity  representing  separate  stages  of  gene  duplications.  Branch  points  and 
relative  branch  lengths  at  the  class  and  isozyme  levels  are  as  presently  estimated  from  available  structures, 
where  especially  order  of  interconnections  of  classes  II,  V,  and  VI  are  still  considered  tentative  in  detail  because 
of  the  limited  number  of  such  structures  known.  MDR,  medium-chain  dehydrogenases/reductases.  SDH, 
sorbitol  dehydrogenase.  ADH,  alcohol  dehydrogenase.  VAT-1,  protein  in  synaptic  vesicles  of  Torpedo  electric 
organs.  d^Cr,  ^-crystallin.  ER,  enoyl  reductase  of  fatty  acid  synthase.  I-VI,  classes  of  ADH.  Bottom  letters 
indicate  isozyme  subunits  of  Uromastix  class  I  ADH  (a,b).  horse  ADH  I  (E,S),  human  ADH  I  (a,p,Y),  and  cod 
ADH  III  (h,r). 


organ  distribution  (epithelia  with  emphasis  on  the  stomach)  and  constitutes  the  class  with 
highest  ethanol  activity,  but  also  with  considerable  retinol  dehydrogenase  activity. 

Structurally,  the  classes  of  alcohol  dehydrogenase  are  clearly  related  and  of  similar 
overall  conformation  as  deduced  both  by  modelling  (Eklund  et  ah,  1990)  and  recent 
crystallographic  investigations  (cf.  El- Ahmad  et  ah,  1995).  Nevertheless,  also  structurally, 
they  exhibit  distinct  properties.  Rates  of  structural  change  are  different  (cf.  below),  with  an 
approximately  3 -fold  difference  between  the  “constant”  class  III  enzyme  and  the  “variable” 
class  I  enzyme,  and  with  the  class  IV  form  in  between  (Kaiser  et  al.,  1989;  Farres  et  al., 
1994).  Also  internally,  the  proteins  differ  in  variable  segments  and  their  positions  (cf. 


Table  1.  Properties  of  classes  I,  III,  and  IV  alcohol  dehydrogenases, 
illustrating  distinct  differences 


I 

III 

IV 

Active  site 

EtOH/MePyr 

HCHO-GSH 

EtOH/Katt 

Overall  structure 

Variable 

Constant 

Intermediate 

Molecular  segments 

Vl-3i 

Vl-2„, 

Origin 

From  III 

Ancestral 

Common  with  I 

(subpiscine) 

(subvertebrate) 

(subamphibian) 

Expression 

Liver 

Ubiquitous 

Epithelial 

Vl-3]  indicate  variable  segments  1-3  in  the  class  I  enzyme,  Vl-2ni  those  in  the 
class  III  enzyme  (Danielsson  et  al.,  1994a).  MePyr  denotes  inhibition  by 
methylpyrazole,  EtOH  denotes  ethanol,  and  GSH  glutathione,  k^atl^  indicates  a  high 
kcat  with  ethanol. 
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Patterns).  The  classes  all  derive  from  gene  duplications  (below,  cf.  Jornvall  et  aL,  1995). 
Finally,  expression  sites  (Table  1)  and  amounts  differ.  In  short,  the  classes  resemble  different 
enzymes.  Class  I  and  III  can  even  have  different  EC  numbers.  Considering  the  long-chain 
alcohol  dehydrogenase  activity,  both  are  EC  1.1. 1.1,  but  considering  the  more  separate 
activities,  the  ethanol-active  class  I  enzyme  is  still  EC  1 . 1 . 1 . 1 ,  while  the  formaldehyde-active 
class  III  is  EC  1.2. 1.1.  They  would  hardly  have  been  initially  considered  as  merely  classes 
(rather  than  enzymes)  had  they  not  had  a  few  substrates  in  common,  in  particular  long-chain 
alcohols,  most  easily  detectable  through  a  common  octanol  dehydrogenase  activity  (cf. 
Danielsson  et  al.,  1994a). 


ORIGIN:  GENE  DUPLICATIONS 

As  with  other  protein  families,  the  common  structures  derive  from  a  set  of  gene 
duplications  correlated  with  subsequent  mutational  events.  Class  III  is  the  parent  form, 
present  in  more  or  less  unaltered  form  through  prokaryotes,  yeasts,  plants,  insects  and  other 
invertebrates,  and  vertebrates.  Details  of  formation  regarding  the  other  classes  are  thus  far 
limited,  but  available  evidence  suggests  that  the  class  I  line  was  the  first  to  branch  off,  at 
early  vertebrate  times,  as  suggested  both  by  evaluation  of  evolutionary  speed  (Cederlund  et 
al.,  1991),  absence  in  lines  originating  before  bony  fish  (Danielsson  et  al.,  1994b)  and  the 
presence  of  a  class  I/III-mixed  form  in  the  line  where  class  I  first  occurs  (bony  fish, 
Danielsson  and  Jomvall,  1992).  Similarly,  calculation  of  phylogenetic  trees  suggests  that 
class  IV  has  a  common  origin  with  class  I,  at  a  somewhat  later  stage  although  still  fairly 
early  in  vertebrate  evolution  as  evidenced  by  its  presence  also  in  amphibians  (unpublished 
together  with  Pares  et  al.).  In  summary,  the  class  system  appears  to  accompany  the  evolution 
of  vertebrates,  through  a  series  of  gene  duplications  starting  early  and  with  class  III  as  the 
ancestral  form. 


PATTERNS:  TYPICAL  AND  ATYPICAL  PROPERTIES 

In  many  respects,  class  I  and  class  III  illustrate  different  principles,  class  III  being  a 
‘Typical”  protein  of  basic  metabolism,  and  class  I  an  “atypical”  protein  with  unexpected 
properties.  This  is  clearly  visible  in  both  the  rate  and  positions  of  evolutionary  changes. 

Regarding  rate,  class  III  is  “constant”,  which  means  a  rate  closely  identical  to  that 
observed  by  glycolytic  enzymes  like  glyceraldehyde-3 -phosphate  dehydrogenase  and  eno- 
lase  (Danielsson  et  al.,  1994a).  Similarly,  its  segments  of  maximal  variability  affects  two 
superficial  regions  (called  1  and  2  in  the  right  panel  of  Fig.  2),  both  situated  at  the  surface 
away  from  the  entrance  to  the  active  site.  This  pattern,  with  little  variability  overall,  and  with 
those  regions  that  do  differ  situated  away  from  the  active  site,  is  the  pattern  expected  for  a 
basic  enzyme  of  central  importance  in  cellular  metabolism,  and  hence  a  property  common 
to  many  proteins  in  general. 

In  contrast,  the  class  I  enzyme  has  a  faster  evolutionary  rate  overall,  and  exhibits 
three  variable  segments,  all  affecting  functionally  important  parts.  Thus,  one  of  the  variable 
segments  (1  in  the  left  panel  of  Fig.  2)  affects  part  of  the  entrance  to  the  active  site.  Another 
(2  in  Fig.  2,  left)  affects  the  segment  around  one  of  the  zinc  atoms  of  this  metalloenzyme, 
and  the  third  (3  in  Fig.  2,  left)  affects  the  major  area  of  subunit  interactions.  Therefore,  the 
class  I  enzyme  exhibits  an  atypical  pattern,  with  variability  at  functionally  important 
segments,  and  with  few  strictly  conserved  properties. 

Remaining  classes  are  at  present  difficult  to  judge  in  this  detail,  since  they  are  thus 
far  only  known  from  single  species,  whereas  class  I  and  III  both  are  established  in  close  to 
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Figure  2.  Fundamental  differences  in  molecular  patterns  between  the  class  I  and  III  alcohol  dehydrogenases. 
Top  models  show  the  subunit  conformations  and  bottom  lines  the  primary  structures,  in  both  cases  with  the 
most  variable  segments  (species  variants)  represented  by  thick  lines  numbered  to  correspond  to  VI -3i  and 
Vl-2i{i  in  Table  1,  respectively,  and  remaining  parts  by  thin  lines.  As  noted,  the  variability  patterns  differ, 
affecting  superficial  and  “non-functional”  sites  in  class  III,  as  typical  of  highly  conserved  proteins  in  basic 
functions,  but  functional  sites  in  class  I  of  an  atypical,  non-conserved  pattern,  interpreted  to  suggest  emergence 
of  novel  functions  and  interactions. 


twenty  species,  showing  the  properties  as  stated.  However,  recent  evaluation  of  species 
divergence  in  class  IV  (from  the  rat  and  human  proteins)  shows  that  class  IV  is  likely  to 
occupy  a  position  in  between  the  constant  class  III  and  variable  class  I  forms  (Farres  et  aL, 
1994).  Similarly,  preliminary  data  on  the  variability  of  remaining  classes  appear  to  suggest 
that  at  least  some  of  them  are  still  more  variable.  Consequently,  the  alcohol  dehydrogenase 
system  encompasses  an  impressive  set  of  enzymes  ranging  from  constant,  typical,  basic 
enzymes  to  variable  forms,  with  in  addition  intermediately  variable  as  well  as  possibly 
super-variable  forms.  Nevertheless,  in  spite  of  this  inter-class  variability,  these  properties 
within  each  class  are  kept  largely  in  the  same  manner  over  long  time  periods,  identifying 
also  the  evolutionary  variability  as  a  dinstinct  property  in  each  case. 


CLASS-MIXED  PROPERTIES:  EMERGENCE  OF  NOVEL  FORMS 

The  alcohol  dehydrogenase  system  with  its  multiplicity  of  forms  derived  through 
gene  duplications  at  several  levels  also  offers  examples  of  what  appears  to  be  emergence  of 
novel  forms.  Thus  far,  two  such  occasions  have  been  discerned  at  the  class  level.  In  these 
cases,  the  novel  form  originating  through  a  duplication  and  evolving  via  subsequent 
mutations,  appears  to  have  acquired  mutations  to  get  novel  class-distinct  properties  but  still 
not  a  sufficient  number  of  mutations  to  loose  the  derivation  from  its  original  class.  The  first 
such  example  in  this  family  was  described  in  the  form  of  the  ethanol-active  enzyme  from 
bony  fish,  represented  by  the  analysis  of  the  cod  enzyme  (Danielsson  and  Jomvall,  1992). 
Thus,  this  protein,  as  summarized  in  Table  2,  is  overall  structurally  more  related  to  the  class 
III  enzyme  than  to  the  class  I  enzyme,  as  shown  by  its  relationships  to  these  two  human 
proteins  (64%  identity  versus  55%),  yet  its  enzymatic  properties  toward  ethanol  are  like  a 
class  I  enzyme  (cf  K^-values,  Table  2).  Thus,  it  appears  as  if  changes  at  active  site  residues 
have  given  rise  to  class  I  properties,  while  the  origin  from  class  III  is  still  visible  by  the 


Distinctive  Class  Relationships  within  Vertebrate  Alcohol  Dehydrogenases 


423 


Table  2.  Class-mixed  properties  of  a  piscine  enzyme  (class  I/III  mixed 
properties)  and  an  avian  enzyme  (class  I/II  mixed  properties) 


“Hybrid  enzyme” 

Compared  with 

Human  I 

Human  III 

Cod  I 

Structure 

55% 

64% 

Function  (K^,  ethanol) 

1.2/1. 1 

1.2/NS 

Human  I 

Human  II 

Ostrich  II 

Structure 

61% 

69% 

Function  (K^,  ethanol) 

0.7/1.1 

0.7/120 

The  cod  enzyme  is  called  Cod  I,  denoting  the  class  nomenclature  after  its 
functional  assignment,  while  the  ostrich  enzyme  is  called  Ostrich  II  after 
its  structural  assignment,  since  there  is  also  another,  true  class  I  enzyme 
in  the  ostrich  (and  other  avian  species).  As  shown  by  the  bold  values,  the 
two  enzymes  are  structurally  most  closely  related  to  one  class  and 
functionally  to  another.  Values  in  the  structure  lines  refer  to  residue 
identities  with  the  classes  of  the  human  enzymes,  while  those  in  the 
function  lines  refer  to  values  (in  mM)  with  ethanol  as  substrate.  NS, 
non-saturable. 


overall,  remaining  residue  identities  with  that  class.  It  therefore  appears  as  if  we  even  now, 
in  much  later,  divergent  lines,  can  still  observe  the  emergence  or  enzymogenesis  (Danielsson 
and  Jdrnvall,  1992)  of  the  novel  enzyme  type  (class  I)  from  the  parent  form  (class  III). 

Recently,  yet  unpublished,  we  have  detected  the  same  overall  pattern  in  relation  to 
the  duplication  giving  rise  to  class  II.  In  this  case,  the  class-mixed  form  was  detected  in  the 
avian  line,  more  closely  in  a  ratite  liver,  as  a  class  II  (structurally)  protein  from  ostrich 
(Hjelmqvist  et  al.,  unpublished).  Also  here,  the  protein  has  acquired  class  I  ethanol  dehydro¬ 
genase  activity,  but  has  an  overall  structural  relationship  to  class  II  (Table  2).  In  this  case, 
since  the  origin  of  class  II  is  not  well  established  (cf.  legend  Fig.  1),  the  exact  branching 
points  are  thus  far  unknown,  but  independent  of  order  of  events,  the  ostrich  enzyme  has 
class-mixed  properties  (Table  2). 

In  conclusion,  the  classes  are  distinct  with  separate  properties  over  long  periods  of 
time,  but  with  emergence  of  novel  forms  post-duplicationary,  giving  rise  to  enzymogenesis 
of  new  activity  while  keeping  traces  of  the  parent  form.  Furthermore,  the  class-mixed 
properties,  once  locked  in  an  animal  line,  appear  to  survive  until  present  times,  although  of 
distant  origin,  as  exemplified  by  the  present  day  cod  enzyme  reflecting  the  early  vertebrate 
class  I/III  duplication.  It  is  to  be  expected  that  further  elucidation  of  all  the  mammalian  and 
human  alcohol  dehydrogenase  classes  and  their  origins  may  trace  still  further  acquirements 
of  novel  functions. 


ISOZYME  DIVERGENCE:  MULTIPLE  EVENTS 

To  some  extent,  also  the  development  of  isozymes  establishes  the  emergence  of 
mixed  properties:  the  isozymes,  being  of  more  recent  duplicatory  origin  than  the  classes  (Fig. 
1),  have  kept  many  substrates  in  common  (hence  are  true  isozymes),  but  still  have  acquired 
novel  substrate  distinctions.  This  was  early  noticed  in  for  example  the  horse  E  (for  ethanol- 
active)  and  S  (for  steroid-active)  liver  alcohol  dehydrogenases.  Furthermore,  the  three  class 
I  lines  where  isozymes  have  thus  far  been  deteceted,  humans  and  other  primates,  horse,  and 
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Figure  3.  Positions  of  residue  differences  between  isozyme  subunits  of  the  three  species  with  characterized 
class  I  alcohol  dehydrogenase  isozyme  differences.  As  shown,  30%  of  the  differences  in  the  horse  enzyme  (at 
positions  17,  43  and  94)  coincide  exactly  with  differences  at  those  positions  also  in  the  other  two  isozyme 
systems. 


Uromastix  lizard,  all  have  the  corresponding  gene  duplications  at  separate  positions  in  the 
phylogenetic  tree  (Jomvall  et  al.,  1995),  suggesting  that  isozyme  formation  has  occurred 
repeatedly  during  vertebrate  evolution  through  multiple  duplicatory  events,  some  fairly 
recent  (like  in  the  primate  line),  some  more  distant  (like  in  the  lizard  line).  In  spite  of  the 
multiple  events,  it  is  noteworthy  that  positions  affected  by  the  isozyme  differences  are  in 
part  identical  in  the  different  lines.  Thus,  no  less  than  three  of  the  totally  only  ten  positions 
with  mutational  differences  between  the  isozyme  subunits  in  the  horse  enzyme  also  are 
affected  by  isozyme  differences  in  the  human  and  Uromastix  enzymes  (Fig.  3).  This 
coincidence  again  suggests  that  some  regions  are  especially  variable  and  that  this  is  noticable 
also  at  the  isozyme  level. 

Also  in  class  III,  isozymes  have  recently  been  detected  (Danielsson  and  Jornvall, 
1 992).  They  differ  in  specific  activity  (composed  of  h  and  /  chains,  from  high  and  low  activity 
forms,  respectively)  again  illustrating  apparent  emergence  of  novel  properties. 

In  conclusion,  isozyme  development  has  occurred  repeatedly,  establishing  that  the 
gene  duplications  have  multiple  origins  with  known,  emerging  isozyme  patterns  in  different 
lineages. 


COMMON  PROPERTIES 

In  view  of  the  many  distinct  properties  established  within  both  the  isozyme  and  class 
development  patterns,  it  should  be  stressed,  however,  that  the  overall  conformational 
properties  of  the  enzymes  have  been  kept  over  wide  time  periods.  This  is  visible  already  in 
the  residue  conservation  pattern  of  alcohol  dehydrogenases:  glycine  is  by  far  the  most 
conserved  residue  in  the  MDR  family,  as  illustrated  by  the  three  columns  in  Table  3,  taken 
from  summaries  at  three  different  times:  1 977,  when  just  two  alcohol  dehydrogenases  (horse 
and  yeast)  were  known  (first  column);  in  1993,  when  five  different  enzyme  lines  were 
compared  (second  column);  and  in  1994,  when  ten-odd  dehydrogenases/reductases  were 
combined  within  the  MDR  family  (third  column).  In  all  cases,  and  clear  already  at  the  first 
comparison,  the  conserved  nature  of  glycine  stands  out.  Apart  from  illustrating  the  value  of 
comparisons  at  any  stage,  these  facts  establish  the  unique  nature  of  the  small  (without 
side-chain)  glycine  residues. 

As  shown  in  the  bottom  figure  insert  of  Table  3,  the  conserved  glycine  positions 
largely  correspond  to  reverse  turns  in  the  conformation  of  mammalian  alcohol  dehydro¬ 
genase  class  I,  establishing  that  these  turns,  and  hence  the  overall  conformation,  is  largely 
conserved.  Of  course,  as  long  as  conformation  and  functional  propeties  are  conserved,  the 
glycine  conservation  is  expected  to  be  observed  in  a  protein  family. 
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Table  3.  Conserved  residues,  illustrating  the  excess 
of  glycine  among  such  residues  (top  columns) 
and  their  conformational  positions 


Amino  acid 
residue 

Horse  ADH  + 
yeast  ADH 

ADH  +  SDH  + 
TDH  +  XDH  +  Zcr 

All  MDR 
proteins 

Gly 

20 

13 

3 

Val 

9 

2 

— 

Ala 

9 

— 

— 

Lys 

6 

2 

~ 

Asp 

6 

1 

— 

Cys 

6 

— 

— 

Glu 

5 

1 

— 

Leu 

5 

— 

— 

Pro 

3 

1 

— 

He 

3 

— 

— 

Arg 

2 

— 

— 

His 

2 

— 

— 

Phe 

2 

— 

— 

Ser 

2 

— 

— 

Thr 

2 

— 

— 

Tyr 

2 

— 

— 

Asn 

1 

— 

— 

Gin 

1 

— 

— 

Sum 

86 

20 

3 

Gly/sum 

20/86  (23%) 

13/20  (65%) 

3/3  (100%) 

The  columns  are  taken  from  three  previous  comparisons  of  the 
family  as  discerned  early  (left),  1993  (middle)  and  now  (right, 
cf.  Persson  et  al.,  1994).  Black  indicates  the  positions  of  the  three 
strictly  conserved  residues  in  the  rightmost  column,  dense  and 
light  stippling  remaining  glycine  residues  in  the  middle  and  left 
columns,  respectively. 
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Figure  4.  Evolutionary  pattern  illustrating  the  continuity  and  time  dependence  on  accumulation  of  mutational 
differences  in  the  development  of  novel  functions. 


CONCLUSION 

Combined,  the  different  structures,  related  through  three  series  of  gene  dupli¬ 
cations  as  illustrated  in  Fig.  1,  may  be  interpreted  to  relate  isozymes,  classes,  and 
enzymes  in  a  continuous  flow  of  structural  divergence  and  functional  emergence. 
Thus,  as  given  in  Fig.  4,  repeated  gene  duplications,  starting  at  distant  times  and 
progressing  all  through  recent  times  have  given  rise  to  multiple  forms  which,  through 
further  mutational  events,  are  now  traceable  as  present-day  enzymes,  classes,  and 
isozymes,  respectively.  The  patterns  of  origin  of  enzymes,  classes  and  isozymes  are 
highly  similar  as  revealed  by  the  middle  portion  of  Fig.  4,  the  major  difference  being 
essentially  the  time  of  emergence,  and  hence  time  of  accumulation  of  subsequent 
mutations.  The  most  recently  originating  forms  are  still  fairly  similar  in  function,  the 
most  distantly  ones  most  different,  with  the  intermediate  ones  Just  similar.  In  this 
manner,  the  pattern  as  discerned  from  present-day  structures  and  phylogenetic  con¬ 
structions  reflect  successive  changes  and  functional  divergence  through  time.  This  is 
well  observed  in  the  MDR  enzymes  of  the  alcohol  dehydrogenase  type  which  nicely 
illustrate  the  pattern  also  discernible  in  several  other  well-studied  protein  families. 
Through  these  observations  on  native  forms,  exact  positions  governing  class  distinctions 
and  isozyme  distinction,  may  now  be  further  scrutinized  by  site-directed  mutagenesis 
in  tests  to  confirm  the  critical  roles  of  particular  positions  in  class  and  isozyme 
functions.  This  has,  in  the  case  of  alcohol  dehydrogenases,  established  roles  of  positions 
57  and  115  in  the  class  I/III  distinctions  (Engeland  et  al.,  1993;  Estonius  et  al.,  1994), 
as  well  as  of  position  48  in  the  human  isozyme  distinctions  (Hoog  et  al.,  1992), 
illustrating  the  value  of  confirmation  by  mutagenesis  of  patterns  traced  by  observations 
of  native  forms. 
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INTRODUCTION 

Self-incompatiblity  is  a  system  for  segregating  self  pollen  from  non-self  pollen 
in  the  female  organ  of  flowering  plants  (de  Nettancourt,  1977).  This  system  is  not 
caused  by  malfunction  of  reproductive  organs  since  self-incompatibility  does  not  appear 
when  the  pistil  accepts  pollen  with  different  S-alleles.  Nicotiana  alata  has  gametophytic 
self-incompatiblity  in  which  pollen  bearing  the  same  S-allele  as  one  of  those  in  the 
pistil  is  rejected  by  arresting  pollen  tube  growth.  This  rejection  takes  place  in  the 
style  where  S-allele  specific  glycoproteins  (S-glycoproteins)  responsible  possibly  for 
both  segregation  of  self  and  non-self  pollens  and  arrest  of  pollen  tube  growth  are 
synthesized  prior  to  anthesis  (Anderson  et  al.,1986).  Recently,  we  have  found  that 
S-glycoproteins  associated  with  self-incompatibility  are  RNases  in  the  RNase  T2  family 
(McClure  et  al.,1989),  based  on  predictions  made  by  amino  acid  sequence  comparisons 
and  chemical  modification  experiments  (Kawata  et  al.,1990).  Subsequently,  a  number 
of  studies  have  been  reported  for  the  primary  structures  of  S-glycoproteins  mainly 
from  Solanaceae.  These  studies  have  revealed  that  all  the  proteins  contain  two  conserved 
amino  acid  sequences,  including  two  histidine  residues  corresponding  to  those  existing 
in  the  active  site  of  RNase  T2.  According  to  very  recent  investigations  (Huang  et 
al.,1994;  Lee  et  al.,1994;  Murfett  et  al.,1994),  RNase  activity  is  indispensable  for  the 
function  of  S-glycoprotein  in  petunia  and  tobacco.  However,  the  presence  of  this 
enzyme  activity  in  the  style  is  not  sufficient  for  the  appearance  of  self-incompatibility. 
In  fact,  we  detected  the  same  RNase  activity  in  the  style  (with  the  stigma)  of  either 
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a  self-incompatible  variant  (Nijisseiki)  of  Japanese  pear  {Pyrus  pyrifolia)  or  its  self¬ 
compatible  mutant  (Osa-Nijisseiki).  Arabidopsis  thaliana,  a  substantially  self-compat¬ 
ible  plant  also  synthesizes  enzymes  of  the  RNase  T2  type  in  the  style  (Taylor  and 
Green,  1991).  Moreover,  two  or  more  RNases  have  been  detected  in  the  style  of 
heterozygous  petunia  (Lee  et  aL,1992)  and  tobacco  (McClure  et  al.,1989),  implying 
that  the  style  of  self-incompatible  plants  synthesize  RNase  that  is  not  associated  with 
self-incompatibility.  Our  investigation  was  undertaken  to  characterize  the  RNase  in 
the  style  of  N.alata  in  order  to  seek  a  structure  motif(s)  specific  for  S-RNase  or 
non-S-RNase.  We  then  separated  three  stylar  RNases,  assigned  individual  RNases  to 
S-RNase  and  non-S-RNase  by  analyzing  their  appearance  in  the  style  during  flower 
development  and  elucidated  the  amino  acid  sequences  of  these  RNases.  This  paper 
summarizes  the  result  of  some  of  these  experiments  and  analyses  of  the  structural 
information  for  RNase  MSI,  a  novel  non-S-RNase. 


SEPARATION  OF  THREE  MAJOR  RNASES  FROM  THE  STYLE  OF 
N.  Alata 

The  extract  of  the  style  of  N.alata  was  chromatographed  on  a  Mono  S  column.  This 
yielded  three  major  peaks  containing  RNase  activity  which  were  named  MS  1 ,  MS2  and  MS3 
according  to  elution  order  (Fig.l). 

A  single  RNase  [MSI  (29kD),  MS2  (31kD)  or  MS3  (30kD)]  was  purified  by 
reverse  phase  HPLC  from  each  of  fractions  MSI,  MS2  and  MS3,  respectively.  The 
N-terminal  and  internal  amino  acid  sequences  of  these  RNase  were  analyzed,  revealing 
that  all  of  them  are  of  the  RNase  T2  type  and  that  they  are  structurally  similar  to  each 
other. 


(M) 

0.5  (%) 
n50 


Figure  1.  Separation  of  fractions  MSI,  MS2  and  MS3  by  Mono  S  column  chromatography.  The  proteins 
precipitated  by  40-90%  saturation  with  ammonium  sulfate  were  chromatographed  on  a  Mono  S  column  (1.6 
X  50  mm)  using  a  gradient  of  0  -  0.5M  NaCl  in  50  mM  sodium  acetate  buffer,  pH  5.0,  at  a  flow  rate  of  100 
pl/min  and  at  8°C.  Shaded  columns  indicate  RNase  activity  as  the  percentage  of  the  total  RNase  activities  for 
fractions  MSI,  MS2  and  MS3.  The  protein  bearing  RNase  activity  in  each  fraction  was  eluted  as  a  single  peak 
by  reverse  phase  HPLC,  collected  and  used  for  sequence  analysis. 
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Figure  2.  Changes  in  RNase  activity  of  frac¬ 
tions  MS  1 ,  MS2  and  MSS  during  flower  de¬ 
velopment.  The  extract  of  five  or  six  styles  at 
each  stage  was  chromatographed  on  a  Mono 
S  column  as  described  in  the  legend  of  Fig.  1 
and  each  fraction  separated  was  assayed  for 
RNase  activity.  Lengths  of  styles  at  stages  1 
to  5  were  0.5-1. 5,  1. 5-2.0,  2.5-3. 5,  5. 5-6.5 
and  7  cm,  respectively.  Anthesis  took  place  at 
stage  5 . 
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ASSIGNMENT  OF  MSI,  MS2  AND  MS3  TO  S-RNase  AND 
NON-S-RNase 

To  examine  whether  these  three  RNases  are  associated  with  self-incompatibility,  the 
appearance  of  these  proteins  in  the  style  during  flower  development  was  followed  by 
assaying  the  RNase  activity  for  individual  RNase  peaks  separated  by  Mono  S  chromatogra¬ 
phy  (Fig.2). 

At  stage  1  (green  bud  around  a  week  before  anthesis),  RNase  activity  was  clearly 
detected  in  the  style  of  N.alata  and  was  exclusively  associated  with  fraction  MSI.  No 
detectable  activity  was  found  in  fractions  MS2  and  MSS.  Fraction  MSI  retained  its 
enzyme  activity  at  the  same  level  throughout  all  stages  until  anthesis.  In  contrast,  the 
activity  of  fraction  MS2  was  detected  first  at  stage  2  (green  bud  around  5-6  days  before 
anthesis)  and  then  the  activity  was  rapidly  increased  to  a  plateau  upon  anthesis,  amounting 
to  about  60%  of  total  RNase  activity.  RNase  activity  of  fraction  MSS  was  also  detected 
at  stage  2  and  its  increase  in  subsequent  stages  was  marginal.  Since  N.alata  gains 
self-incompatibility  when  S-RNase  is  present  in  the  style  (Anderson  et  al.,1986),  the 
observed  time-dependence  of  appearance  of  individual  stylar  RNases  suggests  that  MSI 
is  a  non-S-RNase  (RNase  MSI)  unassociated  with  self-incompatibility  and  both  MS2 
and  MSS  are  S-RNases  associated  with  it. 


THE  AMINO  ACID  SEQUENCE  OF  RNase  MSI 

To  elucidate  the  amino  acid  sequence  of  the  stylar  RNase  MSI  purified  above,  two 
oligonucleotides  corresponding  to  the  conserved  sequences  involving  either  one  of  the  two 
essential  histidine  residues  of  RNase  T2  were  synthesized  and  used  for  RT-PCR  with  mRNA 
from  a  mixture  of  styles  collected  from  bud  and  mature  flower.  Three  cDNAs  [ms  1(651  bp), 
ms2(654  bp)  and  ms3(654  bp)]  were  eventually  cloned.  The  nucleotide  sequence  of  ms  I  was 
determined  and  its  deduced  amino  acid  sequence  was  verified  by  sequencing  peptides 
obtained  by  digestion  withAchromobacterprotQSLSQ  I,  V8  protease  and  Asp-N  of  RNase  MSI . 
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This  protein  is  composed  of  197  amino  acid  residues  and  its  pi  was  calculated  as  6. 1 .  There 
is  a  potential  N~glycosylation  site  (Asn  27)  where  PTH-Asn  was  not  identified  by  Edman 
degradation.  Similarly,  the  amino  acid  sequences  of  MS2-  and  MSS-RNases  were  elucidated 
by  sequencing  their  cloned  genes  and  verifying  as  described  above  (data  not  shown).  All 
three  stylar  RNases  hold  several  conserved  amino  acid  sequences  compared  with  Solanaceae 
S-RNases  and  were  classified  as  enzymes  in  the  RNase  T2  family. 


AMINO  ACID  SEQUENCE  COMPARISON  BETWEEN  RNase  MSI 
AND  S-RNases  FROM  Solanaceae 

The  amino  acid  sequence  of  RNase  MSI  was  compared  with  those  of  known 
S-glycoproteins  from  l^.alata  and  revealed  several  interesting  observations  (Fig.3). 

Firstly,  the  sequence  identity  between  a  pair  of  any  two  proteins  of  MSI,  MS2  and 
MS3  is  45-52%,  showing  that  these  three  RNases  are  homologous  proteins  even  though 
RNase  MSI  is  a  non-S-RNase.  Secondly,  RNase  MSI  is  highly  homologous  (95%)  to 
Sg-RNase  when  compared  with  the  published  amino  acid  sequence  in  which  approximately 
20  of  the  N-terminal  residues  are  not  known  (Keyrs-Pour  et  al.,  1990).  Thirdly,  MS2-RNase 
has  a  74%  sequence  identity  with  S3-RNase.  Fourthly,  MS3-RNase  has  a  sequence  identical 
to  Spii -RNase. 

As  described  earlier,  sequence  homology  between  RNase  MSI  and  S^-RNase  is 
unusually  high.  It  should  be  possible  to  detect  a  structural  motif  characteristic  of  S-RNase 
or  non-S-RNase  when  the  above  two  proteins  are  compared.  The  small  5%  sequence 
difference  is  localized  to  a  short  stretch  of  six  consecutive  residues  at  positions  135-  140.  If 
the  unknown  amino  acid  sequence  of  the  N-terminal  portion  of  Sg-RNase  is  the  same  as  that 
established  for  RNase  MSI,  it  may  be  possible  to  identify  an  amino  acid  residue(s) 
responsible  for  distinguishing  S-RNase  from  non-S-RNase.  Unfortunately,  no  particular 
features  were  detected  when  we  carefully  compared  the  short  sequence  of  RNase  MSI, 


MSI 

51 

52 

53 
S6 
Sa 
SFll 
Sz 
X2 


1  10 
TFDQLQLVLTWPP 
NFEYMQLVLTWPT 
AFEYMQLVLTWP I 
AFEYMQL VLQWPA 
AFEYMQLVLQWPT 

DFEYLQLVLTWP A 
DFDYMQLVLTWP A 
AYYEYMQLVLQWPT 


20  30  40  50  60 

S  FCHGKP - CTR I-PKNFT I HGLWPDEQHGMLNDCGET - FTKLRE  PREKKE-LDD 

AFCNVMN - CERT-PTNFTI HGLWPDNVSTELNYCDRQKKFKLFEDDKKQND-LDD 

T  FCR I KH - C  ERT-PTNFT I HGLWPDNHTTMLNYCDRSKPYNMFTDGKKKND-LDE 

AFCHTTPSPCKRI-PNNFTI HGLWPDNVS TMLNYCDREDEYEKLDDDKKKKD-LDD 

AFC HTTP - CKNI-PSNFT I HGLWPDNVS  TT  LNFCGKEDDYN I  IMDGPEKNG-LYV 

I-PKNFT I HGLWPDEQHGMLNDCGET - FTLKRE PREKKE-LDD 

S  FCYANH - CER I APNNFT I HGLWPDNVKTRLHNCKPKPTYS YFTGKMLND - LDK 

S  FCYPKHF-CSRI APKNFT I HGLWPDKVRGRLQFCT S EK-Y VNFAQDS P I LDDLDH 
A  FCHA  S  PT-CKVT~PNNFT I HG LWP DN V S  T T  L NY C K S  KT GK YNN IKDPTIKNELYK 
Sts*  *  *  *********s|s  sic  *  * 
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52 

53 
S6 
Sa 
SFll 
Sz 
X2 


MS  1 

51 

52 

53 
S6 
Sa 
SFll 
Sz 
X2 


70  80  90 

RWPDLKRSRSDAQDVQS  FWE YE YNKHGTCC 
RWPDLTLDRDDCKNGQGFWSYEYKKHGTCC 
RWPDLTKTKFDSLDKQAFWKDEYVKHGTCC 
RWPDLTI ARADC I EHQVPWKHE YNKHGTCC 
RWPDL IREKADCMKTQNFWRREYIKHGTCC 
RWPDLKRSRSDAQDVES  FWE YE YNKHGTCC 
HWMQLKFEQDYGRTEQPSWKYQY I KHGSCC 
HWMELKYHRDFGLSNQFLWRGQYQKHGTCC 
RWPDLTTSETDCLGNQNFWKRE YNKHGTCC 
if.  if.  *  *  *s|c*  ifif 

140  150  160 

DE I AEAI RAVTQAY-PNLNCVGDPQKI LE 

QKYNNTVKA I TKG F-PNL T CNKQ - ME 

QNLNNT I KA I TGG F- PN L TC S R L R - E 

QKl NST I KA I TQGY-PNLSCTKRQ - ME 

QKI NNT I KTVTKGY-PNLSCTKGQ - E 

DECEKQSEAVTQAY-PNLNCVGDPQKILE 
QDI FDAI KTVSQEN-PDI KCAEVTKGTPE 
NETRDA I KTVTNQVDPDLKCVEH I KGVRE 

QKI NST I KT I TRGY-PNLSCTEE - ME 

*  *  ^ 


100  110  120  130 

TELYDQAAYFDLAKNLKDKFDLLRNLKNEGI I PGSTYTV 
LPSYNQEQYFDLAMALKDKFDLLKSFRNHGI I PTKSYTV 
SDKFDREQYFDLAMTLRDKFDLLS  S  LRNHGISRGFSYTV 
SKSYNLTQYFDLAMALKDKFDLLTSLRKHGI I PGNSYTV 
SEI YNQVQYFRLAMALKDKFDLLTS  LKNHGI  I RGYKYTV 
TELYDQAAYFDLAKNLKDKFDLLRNLKNEGII PGSTYTV 
QKRYNQNTYFGLALRLKDKFDLLRTLQTHRI  I PGS  SYTF 
I PRYNQMQYFLLAMRLKDKFDLLATLRTHGITPGTKHTF 
SGRYNLQQYFHLAMALKDKFNLLTSLTNHGI I PGSNYNV 

S|CS|e  s}C5ic  *5|C8lC5tC5lc  *  sfS  Sis 

170  180  190  197 

LSE  I  G I CFDRGATKV ITCRRRTTCNPINKKEISFPLN 
LQE I GI CFDQKVKNV IDCPRPKTCKATR-NGITFP 
LKEIGICFDETVKNVI DC PN PKTC K PTN-KGVMF P 
LLE I G I CFDSKVKNVI DCPHPKTCKPMGNRGIKFP 
LWEVGI CFDSTAKNVI DCPNPKTCKTASNQGIMFP 
LSEIGICFDRGATKVITCRRRTTCNPINKKEISFPLN 
LYE  I GI CFTPNADSMFRCPQSDTCDKTA-KVL-FRR 
LYEIGICFTPTADSFFQCPHSNTCDETGITKILFRR 

LWEI GICFDSTVKNVI DCPHPKTCNPT - I  I KFP 

*  if  s|c5l<s|c*  sic  sic*  sic 


Figure  3.  Alignment  of  the  amino  acid  sequences  of  RNase  MSI,  RNase  X2  and  S-RNases.  Si  -  Sz  represent 
enzymes  reported  as  S-RNases  in  N.alata.  X2  is  RNase  X2  that  is  not  linked  to  S-locus  in  Rinflata.  Conserved 
amino  acid  residues  among  all  of  listed  RNases  are  marked  with  asterisks.  Amino  acid  sequences  were  reported 
by  McClure  et  al.,  (1989)  (S2  -  Se),  Kheyr-Pour  et  al.,  (1990)  (Sj,  S^,  Spn  and  Sz)  and  Lee  et  al.,(1992)(X2). 
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RNase  MSI 
Gene  msl 


"Gene  Sa" 
Sa-RNase 


135  140 

Glu  He  Ala  Glu  Ala  lie  Arg  Ala 
GAG  ATT  GCA  GAA  GCA  ATC  AGA  GCA 
I  {  A  A 

-AT  -A  +AA  +A’ 

1  I  i  I  }  I 

GAG  TGC  GAA  AAG  CAA  TCA  GAA  GCA 
Glu  Cys  Glu  Lys  Gin  Ser  Glu  Ala 


Figure  4.  Possible  changes  in  the  nucleotide  sequences  encoding  residues  135  -  140  observed  between  RNase 
MSI  and  Sg-RNase.  The  putative  nucleotide  sequence  of  the  “gene  Sg”,  the  putative  gene  encoding  Sg-RNase, 
is  shown  in  italics.  Either  A  or  G  may  be  inserted  at  position  marked  by  asterisk. 


Ile-Ala-Glu-Ala-Ile-Arg,  with  that  of  its  counterpart  of  Sg-RNase,  Cys-Glu-Lys-Gln-Ser- 
Glu.  However,  an  interesting  correlation  was  found  at  the  nucleotide  sequence  level  (Fig,4), 

If  the  first  two  base  (AT)  in  the  ATT  codon  for  He  and  the  third  base  (A)  in  the  GCA 
codon  for  Ala  in  msl  were  deleted  and  instead  the  A  A  sequence  and  A  or  G  were  inserted 
between  the  GAA  (for  Glu)  and  GCA  (for  Ala)  and  after  the  AGA  codon  for  Arg,  respectively, 
the  nucleotide  sequence  coding  for  the  amino  acid  sequence  of  RNase  MSI  would  be  revised 
to  code  for  the  counter-sequence  of  six  residues  in  Sg-RNase.  This  finding  is  very  interesting 
because  structural  differences  of  these  nucleotide  sequences  may  be  responsible  for  func¬ 
tional  difference  between  RNase  MSI  and  Sg-RNase.  However,  it  is  unclear  whether  such 
mutations  at  the  extremely  limited  region  of  the  nucleotide  sequence  is  possible.  In  any  event, 
the  following  will  be  needed  to  clarify  whether  this  structural  difference  is  responsible  for 
the  identity  of  S-RNase  in  N.alata:  verification  of  the  ambiguous  amino  acid  sequence  of 
this  limited  region  at  the  protein  level;  elucidation  of  the  as  yet  unidentified  N-terminal 
sequence  and  reconfirmation  of  association  with  self-incompatibility  of  Sg-RNase. 

Petunia  inflata,  a  self- incompatible  plant  and  a  member  of  the  Solanaceae  family, 
also  produces  a  non-S-RNase  in  the  style  and  the  enzyme  is  called  RNase  X2  (Lee  et 
al.,1992).  Sequence  identity  between  RNase  MSI  and  RNase  X2  is  48.7%,  which  is  in  the 
range  of  average  similarity  among  S-RNases.  Moreover,  there  is  no  particular  local  sequence 
similarity  except  the  common  ones  for  S-RNases  and  no  special  structural  motif  charac¬ 
teristic  of  non-S-RNase  was  found  by  comparing  these  two  non-S-RNases.  RNS2  is  a  stylar 
RNase  in  the  RNase  T2  family  and  has  been  isolated  from  Arabidopsis  thaliana,  a  plant 
devoid  of  self-incompatibility  (Taylor  et  al.,1993).  RNase  MSI  holds  31%  sequence  identity 
to  this  scenesence  associated  RNase.  Again,  a  particular  motif  indicating  structural  related¬ 
ness  as  a  non-S-RNase  to  RNS2  was  not  found  in  the  primary  structure. 


CONCLUSION 

The  present  investigation  demonstrates  that  the  style  of  heterozygous  N.  alata 
contains  a  novel  RNase  T2-type  enzyme.  The  enzyme  is  thought  to  be  a  constitutive  protein, 
indicating  that  all  pistil  RNases  of  RNase  T2  type  in  self-incompatible  plants  are  not 
associated  with  self-incompatibility.  It  was  difficult  to  distinguish  RNase  MS  1  from  coex¬ 
isting  S-RNases  based  on  the  primary  structure.  Search  for  a  structural  factor(s)  which  is 
responsible  for  the  functional  difference  between  constitutive  and  S-locus  related  RNases  is 
under  progress  in  our  laboratory. 
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INTRODUCTION 

Traditionally,  proteins  have  been  regarded  as  well-defined,  uniform  molecules,  where 
one  molecule  is  virtually  identical  to  the  next.  This  notion  has  been  supported  by  the  fact 
that  the  highly  efficient  protein  primary  structure  analysis  by  prediction  from  the  DNA 
sequence  will  result  in  a  well-defined,  unique  amino  acid  sequence,  containing  no  direct 
indication  of  any  kind  of  modification  or  processing.  However,  information  is  accumulating 
about  protein  heterogeneity,  co-  or  post-translational  modification  and  processing  as  well  as 
about  the  functional  implications  of  the  structural  variation  (Krishna  and  Wold,  1 993;  Graves 
et  al.,  1994).  Human  fibrinogen  may  serve  as  an  extreme  example  of  a  protein  existing  in  a 
multitude  of  structural  forms,  many  of  which  have  been  demonstrated  to  differ  in  functional 
properties  (Henschen  and  McDonagh,  1986;  Henschen,  1993).  In  the  following,  the  various, 
so  far  recognized  structural  variations  and  their  possible  function  effects,  together  with  some 
relevant  identification  procedures  will  be  described. 


FIBRINOGEN  STRUCTURE 

Fibrinogen  is  a  central  protein  in  the  blood  coagulation  system  (Henschen  and 
McDonagh,  1986).  The  fibrinogen  molecule  is  composed  of  three  pairs  of  non-identical 
peptide  chains  denoted  Aa,  Bp  and  y.  The  overall  structure  can  thus  be  described  as  (Aa, 
Bp,  y)2.  During  blood  clotting  thrombin  proteolytically  cleaves  two  pairs  of  peptide  chains 
releasing  fibrinopeptides  A  and  B  to  form  fibrin  monomer  with  the  structure  (a,  P,  7)2-  The 
fibrin  monomer  can  polymerize  in  an  ordered  fashion.  The  human  fibrinogen  chains  Aa,  Bp 
and  y  contain  610,  461  and  411  amino  acid  residues  respectively  in  their  most  commonly 
occurring  forms.  The  peptide  chains  are  interconnected  both  within  each  half  and  between 
the  halves  of  the  molecule  by  a  total  of  29  disulfide  bridges.  The  molecular  weight  of  the 
human  protein  is  340  000.  Its  covalent  structure  was  first  elucidated  by  protein  sequence 
analysis  (see  Henschen  and  McDonagh,  1986;  Henschen,  1993).  The  work  was  completed 
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Figure  1.  Human  fibrinogen,  a  model  of  the  covalent  structure.  The  chains  are  aligned  according  to  homology 
with  the  N-termini  in  the  center.  The  thin,  connecting  lines  represent  disulfide  bonds,  the  diamonds  carbohy¬ 
drate  sidechains  and  the  thin  arrows  thrombin  cleavage  sites.  On  the  left  side,  the  bold  arrows  point  upwards 
to  the  sites  for  alternative  processing  during  biosynthesis  and  downwards  to  the  polymorphic  sites.  On  the  right 
side,  the  bold  arrows  point  upwards  to  the  sites  for  proteolytic  processing  and  downwards  to  the  phosphory¬ 
lation  site  in  the  Aa  chain,  the  proline-hydroxylation  site  in  the  Bp  chain  and  the  sulfation  site  in  the  longer  y 
chain. 


in  1979  and  somewhat  later  confirmed  and  extended  by  DNA  sequence  analysis  (see  Chung 
et  al.,  1990).  A  model  of  human  fibrinogen  is  shown  in  Fig.  1 . 


FIBRINOGEN  FUNCTION 

It  is  generally  assumed  that  the  most  fundamental  biological  role  of  fibrinogen  lies 
in  its  ability  to  form  the  skeleton  of  the  blood  clot  and  thereby  prevent  blood  leakage. 
Furthermore,  fibrinogen  is  believed  to  play  a  significant  role  in  many  additional  pathophysi¬ 
ological  processes,  such  as  those  related  to  wound  healing,  defense  mechanisms  and  tumor 
growth  and  metastasis.  Evidence  has  been  presented  by  many  research  groups  that  fibrinogen 
may  be  one  of  the  most  important  risk  factors  for  cardiovascular  disease  (Humphries  et  al., 
1 987;  Kannel  et  al.,  1 987).  In  order  to  fulfill  all  these  roles  fibrinogen  has  to  interact  in  highly 
specific  ways  with  itself  and  with  a  large  number  of  other  proteins  as  well  as  cells  and 
lower-molecular-  weight  ionic  components  as  listed  in  Table  1 .  Each  type  of  interaction  is 
expected  to  be  due  to  the  structure  of  one  or  more  functional  sites  in  the  fibrinogen  molecule. 


FIBRINOGEN  HETEROGENEITY 

Human  fibrinogen  has  since  long  time  been  described  as  a  highly  heterogeneous 
protein  (see  Henschen  and  McDonagh,  1986;  Henschen,  1993),  the  heterogeneity  being 
evident  already  from  the  variations  in  solubility  properties,  ion-exchange  chromatography 
behavior  of  total  fibrinogen  and  of  its  peptide  chain  components,  as  well  as  the  gel-electro¬ 
phoretic  behavior  of  the  peptide  chains.  Over  the  years,  a  considerable  number  of  sites  or 
sections  of  the  molecule  have  been  shown  to  exist  as  several  structurally  alternative  forms 
which  at  least  partly  explain  the  overall  heterogeneity  of  the  total  molecule.  The  regional 
variants  in  normal  human  fibrinogen  can  belong  to  either  of  two  main  categories,  those  which 
are  non-inherited  and  may  be  present  in  all  individuals  and  those  which  are  inherited  and 
therefore  present  only  in  certain  individuals.  Additional  regional  structural  variants  are 
caused  by  several  types  of  non-genetic  and  genetic  diseases.  The  so  far  identified  types  of 
variants  in  normal  human  fibrinogen  are  listed  in  Table  2.  The  positions  of  the  more  unique 
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Table  1.  Functional  sites  in  fibrinogen 

Thrombin  cleavage 
Polymerization 
Crosslinking 
Plasmin  cleavage 
Thrombin 
Factor  XIII 
Plasmin(ogen) 

Plasminogen  activators 
Fibronectin 
a2-Antiplasmin 
Thrombospondin 
Albumin 
Collagen 
Lipoprotein(a) 

Cell  interaction  Platelets 

Erythrocytes 
Monocytes 
Macrophages 
Endothelial  cells 
Fibroblasts 

Staphylococci,  streptococci 
Ion  binding  Heparin 

Calcium 
Zinc 
Citrate 
EDTA 


Intrinsic 


Protein  interaction 


Table  2.  Types  of  variants  in  human  fibrinogen 

Non-inherited  Alternative  processing  of  Aa/y  chains 
Phosphorylation  of  Aa  chain 
Sulfation  of  y  chain 
Proline-hydroxylation  of  Bp  chain 
Glutamine-cyclization  in  Bp  chain 
Methionine-oxidation  in  Aa/Bp/y 
chains 

Desamidation  of  Aa/Bp/y  chains 
Glycosylation  of  Bp/y  chains 
Proteolysis  of  Aa/y  chains 
Inherited  Polymorphism  in  Aa/Bp  chains 

Mutation  in  Aa/Bp/y  chains 


Normal  variant  combinations:  over  1  million. 
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Table  3.  Posttranslationally  modified  amino  acids  in  human 
fibrinogen 


Modification 

Peptide  chain 

Residue 

Position 

Sequence 

Phosphorylation 

Aa 

S 

3 

ADSGEGD 

Aa 

s 

345 

NPGSSER 

Sulfation 

Longy 

Y 

418 

ETEYDSL 

Longy 

Y 

422 

DSLYPED 

Hydroxylation 

Bp 

P 

31 

SLRPAPP 

Cyclization 

Bp 

Q 

1 

ZGVNDN 

Glycosylation 

Bp 

N 

364 

MGENRTM 

Y 

N 

52 

QVENKTS 

sites  within  the  protein  structure  are  indicated  in  Fig.  1  and  corresponding  sequences  in 
Table  3. 

Non-inherited  Variants 

There  are  three  principal  types  of  non-inherited  regional  variants  present  in  mam¬ 
malian  fibrinogen.  They  are  caused  by  alternative  processing  during  biosynthesis,  posttrans- 
lational  modification  of  specific  amino  acid  residues  and  proteolytic  degradation. 

Alternative  Processing.  Both  the  Aa  and  the  y  chain  occur  in  two  different  forms 
due  to  alternative  biosynthetic  processing.  The  two  forms  differ  at  the  C-terminal  end  of  the 
peptide  chains  depending  on  the  utilization  or  non-utilization  of  an  additional  exon  (Fig.  1). 
In  the  y  chain,  the  last  four  of  the  41 1  amino  acid  residues  of  the  sequence  are  replaced  in 
about  10%  of  the  chains  by  a  stretch  of  20  amino  acids  (Chung  and  Davie,  1 984;  Fornace  et 
al.,  1984).  The  two  types  of  y  chains  differ  in  their  functional  properties  in  the  way  that  only 
the  shorter  ones  can  mediate  the  ability  of  fibrinogen  to  interact  with  platelets.  However, 
both  forms  can  be  crosslinked  by  the  transglutaminase,  factor  XIII,  and  thus  participate  in 
clot  stabilization.  In  the  Aa  chain,  only  2%  of  the  chains  seem  to  occur  in  the  longer  form, 
but  here  the  chain  carries  a  236  amino  acid  residue  extension,  the  function  of  which  is  not 
yet  known  (Fu  and  Grieninger,  1994). 

Phosphorylation.  All  three  peptide  chains  are  posttranslationally  modified  at  certain 
amino  acid  residues  (Table  1 ).  The  Aa  chain  is  partially  phosphorylated  at  two  serine  residues 
(Seidewitz  et  al.,  1984),  one  in  fibrinopeptide  A,  i.e.  position  3  of  the  chain,  and  the  other 
in  the  middle  part  of  the  chain,  in  position  345.  Both  serine  residues  occur  in  a  Ser-Xaa-Glu 
sequence,  i.e.  a  coding  sequence  for  casein  kinase  II.  It  is  assumed  that  both  positions  are 
completely  phosphorylated  during  biosynthesis,  but  that  the  phosphate  groups  subsequently 
are  removed  by  a  phosphatase  in  the  blood  so  that  only  about  20%  are  left.  However,  during 
an  acute  phase  reaction,  with  increased  synthesis  giving  rise  to  a  higher  level  of  fibrinogen, 
and  in  the  fetus  or  newborn,  up  to  70%  phosphorylation  is  observed  (Seidewitz  et  al.,  1984). 
Phosphorylation  often  serves  as  an  important  signal  in  several  biological  processes,  but  the 
functional  relevance  in  fibrinogen  is  unclear  as  the  rate  of  thrombin-induced  fibrinopeptide 
release  is  independent  of  the  degree  of  phosphorylation.  There  is,  however,  some  indication 
that  phosphorylation  protects  against  proteolytic  degradation  (see  below). 

The  presence  of  a  phosphorylated  residue  can  often  be  surmised  when,  after  frac¬ 
tionation  by  reversed-phase  high-performance  liquid  chromatography  (HPLC),  two  peptide 
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fragments  turn  out  to  contain  the  same  amino  acid  sequence,  except  for  a  certain  position 
where  the  earlier  eluting  fragment  has  a  suspiciously  low  yield  of,  e.g.,  serine  or  even  a  gap 
in  the  sequence  and  the  later  eluting  fragment  a  normal  yield  of  the  residue.  Evidence  can 
conveniently  be  obtained  by  digestion  with  alkaline  phosphatase,  followed  by  re-chroma- 
tography  and  re-sequencing;  the  earlier  eluting  fragment  should  now  appear  in  the  position 
of  the  later  eluting  one  and  the  yield  of  the  relevant  amino  acid  residue  should  be  normal. 

Sulfation.  Many  mammalian  fibrinogens  contain  sulfated  tyrosine  residues,  and 
typically,  these  residues  are  found  in  highly  acidic  sequence  environments.  In  the  human 
protein,  only  the  longer  y  chain  splice  variant  is  sulfated  (Farrell  et  al.,  1991).  The  two 
tyrosines,  in  positions  418  and  422,  were  both  fully  sulfated  in  the  samples  so  far  analyzed 
(Henschen,  1993).  The  functional  relevance  of  the  modification  is  unknown. 

Sulfation  of  tyrosine  residues  may  often  escape  attention,  as  the  modification  is 
highly  acid-labile,  the  sulfate  group  being  hydrolyzed  off  during  sequencing  and  unmodified 
tyrosine  appearing  in  the  PTH  amino  acid  chromatogram.  In  order  to  identify  modified 
tyrosine  residues,  a  procedure  was  developed  which  utilizes  the  protective  effect  of  the 
sulfate  (or  phosphate)  group  against  nitration  by  tetranitromethane.  When  peptide  fragments 
are  nitrated  before  sequencing,  the  originally  unmodified  tyrosines  will  be  completely 
converted  into  nitrotyrosine,  which  easily  can  be  identified  in  high  yield  during  standard 
PTH  amino  acid  analysis  (eluting  between  Val  and  Phe/Lys,  but  separate  from  DPTU),  and 
sulfated  tyrosines  will  appear  exclusively  as  tyrosine,  though  in  slightly  lower  yield  (Hen¬ 
schen,  1993).  Additional  evidence  for  sulfation  can  be  obtained  by  specific  cleavage  of  the 
sulfate  bond  by  arylsulfatase  or  by  M  HCl  for  4  minutes  at  100  degrees  followed  by 
identification  of  unmodified  tyrosine  in  the  sequence  (phosphorylated  tyrosines  are  unaf¬ 
fected  by  these  treatments).  Furthermore,  sulfated  peptides  are  eluted  before  the  correspond¬ 
ing  unsulfated  ones  on  reversed-phase  HPLC. 

Proline-Hydroxylation.  The  human  Bp  chain  is  hydroxylated  at  the  proline  residue 
in  position  31  to  about  20%  (Henschen  et  al.,  1991).  The  finding  was  quite  unexpected  as 
hydroxyproline  occurs  primarily  in  collagen-like  proteins  where  it  is  of  great  functional 
importance  in  regulating  the  optimal  temperature  stability  of  the  triple  helix.  However,  in 
fibrinogen  the  function  is  unknown  and  the  sequence  around  the  modified  proline  differs 
from  the  collagen  consensus  sequence.  So  far  only  few  samples  of  fibrinogen  have  been 
analyzed  for  hydroxyproline  content. 

The  presence  of  4-hydroxyproline  can  easily  be  established  in  a  homogeneous 
peptide  by  sequence  analysis,  where  two  characteristic  PTH  amino  acid  derivatives  may  be 
observed.  Also  the  hydroxylated  fragment  is  eluted  before  the  corresponding  unmodified 
fragment  on  reversed-phase  HPLC. 

Cyclization  ofN-Terminal  Glutamine.  The  Bp  chains  of  many  mammalian  fibrino¬ 
gens,  including  the  human,  start  with  a  pyroglutamic  acid  residue,  derived  from  a  glutamine 
residue  according  to  DNA  sequence  analysis.  In  fact,  the  N-terminal  pyroglutamic  acid  in 
human  fibrinogen  seems  to  have  to  have  been  the  first  residue  of  this  kind  detected  in  a 
protein  (see  Blomback  et  al.,  1966).  In  the  human  Bp  chain  the  conversion  to  the  pyroglu¬ 
tamic  acid  form  is  complete  already  in  the  blood.  The  blocked  N-terminal  may  serve  as  a 
protection  against  degradation  by  aminopeptidases  (see  below). 

Methionine-Oxidation.  Recently,  it  was  discovered  that  several  methionine  residues 
in  all  three  peptide  chains  of  human  fibrinogen  exist  partially  in  the  methionine  sulfoxide 
form  (Chen  and  Henschen,  1994).  It  had  been  observed  that  certain  methionine  residues  are 
in  part  resistant  to  cyanogen  bromide  cleavage  in  native,  but  not  in  mercaptolyzed  and 
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alkylated,  fibrinogen.  The  incomplete  cleavage  resulted  in  the  appearance  of  additional 
components  on  fractionation  by  reversed-phase  HPLC.  The  extent  of  oxidation  at  the 
relevant  positions  seemed  to  be  10-25%.  The  oxidation  was  not  caused  by  the  commercial 
fibrinogen  purification  procedure,  which  often  includes  virus-inactivation  treatment,  as 
fibrin,  quickly  isolated  from  blood  donor  plasma  by  simple  clotting  with  thrombin,  showed 
the  same  cyanogen  bromide  fragmentation  pattern  as  the  commercial  fibrinogen.  It  may  be 
suggested  that  the  oxidation  is  caused  by  the  hypochlorite-related  agents  released  by 
activated  leukocytes  or  phagocytes  in  the  blood.  The  number  of  oxidized  methionines  and 
the  extent  of  oxidation  could  both  be  specifically  increased  by  treatment  with  low  concen¬ 
trations  of  chloramine-T.  The  oxidation  caused  a  chloramine-T-concentration  dependent  loss 
in  clotting  or  polymerization  ability.  Molecular  species  with  damaged  and  undamaged 
polymerization  sites  could  be  separated  by  affinity  chromatography,  allowing  the  identifi¬ 
cation  of  methionine  residues  relevant  to  fibrinogen  function  (Chen  and  Henschen,  1994). 

The  analysis  of  methionine  sulfoxide  is  hampered  by  the  labile  nature  both  of 
methionine  and  its  sulfoxide,  one  being  easily  converted  into  the  other.  Certain  methionine 
residues  seem  prone  to  oxidation,  especially  in  denatured  proteins.  Methionine  sulfoxide 
residues  appear  as  methionines  during  sequence  and  amino  acid  analysis.  A  procedure  was 
therefore  developed  for  the  identification  of  the  oxidized  residues  (Chen  and  Henschen, 
1994).  The  positions  of  these  residues  and  the  extent  of  oxidation  could  be  established  by 
quantitative  N-terminal  sequence  analysis  after  cyanogen  bromide  cleavage  of  specifically 
modified  samples.  Thus,  in  one  set  of  samples,  all  methionine  residues  were  converted  into 
the  cyanogen  bromide-refractive  form  by  alkylation  and  the  methionine  sulfoxide  residues 
subsequently  reduced  to  methionine  by  mercaptoethanol  treatment.  In  control  samples,  the 
reduction  preceded  the  alkylation.  The  results  indicated  that  the  reduction,  alkylation  and 
cyanogen  bromide  cleavage  were  quantitative  under  the  conditions  used,  so  that  the  sequenc¬ 
ing  results  could  be  employed  for  the  identification  and  quantification  of  the  modified 
residues. 

Amide-Ammonia  Loss.  Preliminary  results  indicate  that  certain  asparagine  and  glu¬ 
tamine  residues  in  all  three  peptide  chains  have  been  partially  converted  to  aspartic  acid  and 
glutamic  acid  residues  by  spontaneous  amide-ammonia  loss.  The  desamidation  results  in  the 
appearance  of  pairs  of  reversed-phase  HPLC  components  with  identical  sequence,  except 
for  certain  positions  which  differ  in  their  state  of  amidation.  The  amidated  versions  of  the 
peptides  are  eluted  before  the  corresponding  desamidated  peptides.  It  is  not  know  if 
amide-ammonia  loss  has  any  effect  on  the  function  or  survival  of  fibrinogen. 

Glycosylation.  Fibrinogen  is  glycosylated  at  two  different  sites  (Fig.  1),  i.e.  in  the 
N-terminal  region  position  52  of  the  y  chain  (Blomback  et  al.,  1973)  and  in  the  C-terminal 
region  of  the  Bp  chain  (Topfer-Petersen  et  al.,  1976).  The  two  carbohydrate  sidechains  are 
highly  similar  since  both  are  N-glycosidically  linked  to  asparagine  residues  and  are  bianten- 
nary.  The  glycosylation  at  the  two  sites  is  complete,  but  heterogeneity  is  caused  by  the 
presence  of  two  or  one  sialic  acid  residue  per  sidechain.  The  amount  of  sialic  aid  influences 
the  rate  of  fibrin  polymerization  in  the  way  that  an  increase  in  acidic  charge  delays  the 
polymerization.  An  increased  extent  of  sialylation,  probably  in  a  tri-  or  tetra-antennary 
sidechain,  and  a  correspondingly  impaired  polymerization  is  found  in  individuals  with  liver 
disease.  A  different,  less  specific  type  of  glycosylation  is  caused  by  the  excessive  glucose 
level  in  diabetics,  the  glucose  being  added  to  certain  amino  groups  in  the  protein. 

Proteolytic  Degradation.  The  Aa  and  y  chains  occur  even  in  the  blood  of  normal, 
healthy  individuals  in  degraded  forms.  The  high  susceptibility  to  proteolytic  degradation 
of  the  Aa  chain  leads  to  the  well  known  molecular  weight  variants  of  human  fibrinogen. 
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Full  size,  undegraded  fibrinogen  has  a  molecular  weight  of  340  kDa  and  accounts  for 
70%  of  the  blood  plasma  fibrinogen.  A  degraded  form  with  a  molecular  weight  of  305 
kDa  accounts  for  25%  and  one  of  270  kDa  for  the  remaining  5%.  The  three  forms  are 
designated  HMW,  LMW  and  LMW',  respectively,  and  they  differ  in  their  Aa  chains.  The 
N-terminal  region  of  the  chain  is  preserved  in  all  three  forms.  In  the  LMW-form  one  of 
the  two  Aa  chains  of  the  molecule  is  lacking  a  C-terminal  portion;  in  the  LMW'-form 
both  Aa  chains  lack  their  C-terminal  part.  The  degradation  leads  to  a  heterogeneous 
C-terminus,  some  identified  C-terminal  residues  corresponding  to  positions  269,  297  and 
309  (Nakashima  et  al.,  1992).  The  enzyme  responsible  for  the  degradation  has  not  yet 
been  identified,  but  both  plasmin  and  leukocyte  elastase  can  be  excluded  because  of  their 
characteristic  cleavage  patterns  (Muller  and  Henschen,  1988).  Differences  in  distribution 
among  the  HMW,  LMW  and  LMW'  forms  have  been  observed  in  connection  with  certain 
diseases.  Fibrin  clots  derived  from  degraded  fibrinogen  are  less  stable.  An  additional, 
less  extensive,  C-terminal  degradation  of  about  25%  of  the  Aa  chains  produces  a  variant 
ending  at  position  583,  the  degradation  presumably  being  caused  by  plasmin  as  it 
corresponds  to  the  earliest  plasmic  cleavage  site  in  fibrinogen.  The  Aa  chain  is  also 
N-terminally  degraded  in  normal  individuals,  but  only  the  first  amino  acid  is  missing  in 
1 0%  of  the  chains,  presumably  due  to  the  action  of  an  aminopeptidase.  The  first  residue 
is  lacking  only  in  chains  which  have  lost  the  phosphate  group  in  position  3,  indicating 
a  protective  effect  of  phosphorylation  against  proteolysis. 

Also  the  Y  chain  is  proteolytically  degraded  in  a  heterogeneous  way  from  the 
C-terminal  side  of  the  chain  with  preserved  N-terminus  (Henschen  and  Edman,  1972).  In 
about  7%  of  the  y  chains  200-300  amino  acid  residues  have  disappeared.  Also  here  the 
causing  agent  is  unknown.  The  shorter  y  chains  are  unable  to  participate  in  clot-stabilization 
by  crosslinking  and  in  platelet  interaction. 

Common,  Inherited  Variants 

Two  types  of  inherited  regional  variants  are  present  in  human  fibrinogen,  i.e.  those 
which  are  common  and  those  which  are  very  uncommon  in  the  population.  The  common 
genetic  variants  may  be  detected  by  comparing  samples  from  many  individuals  in  the 
population.  These  polymorphic  variants  give  rise  to  sequence  microheterogeneity  in  pooled 
samples.  Two  polymorphic  sites  have  recently  been  detected  in  human  fibrinogen.  Position 
3 12  in  the  Aa  chain  can  contain  threonine  or  alanine  and  position  448  in  the  Bp  chain  lysine 
or  arginine  (Baumann  and  Henschen,  1993,  1994).  The  allele  frequencies  for  the  pair 
threonine-alanine  were  0.76  and  0.24,  those  for  the  pair  lysine-arginine  0.85  and  0.15  in 
California  blood  donors.  A  polymorphic  variation  is,  in  principle,  expected  to  be  unrelated 
to  the  functional  properties  of  the  protein,  and  nothing  is  yet  known  about  a  direct  effect  of 
the  polymorphic  variation  on  the  properties  of  fibrinogen.  However,  the  polymorphism  in 
the  Bp  peptide  chain  turned  out  to  be  highly  correlated  to  some  other  polymorphisms  in  the 
Bp  gene  (Baumann  and  Henschen,  1994),  which  previously  have  been  reported  to  correlate 
with  the  level  of  fibrinogen  in  the  blood,  and  thus,  at  least  indirectly,  with  the  property  of 
fibrinogen  as  a  risk  factor  in  thromboembolic,  cardiovascular  disease  (Humphries  et  al., 
1 987;  Kannel  et  al.,  1 987).  It  now  seems  meaningful  to  question  if  the  variation  in  the  protein 
structure  or  the  variation  in  the  gene  structure  or  possibly  both  together  contribute  to  the 
fibrinogen-related  risk. 

Recently,  a  system  has  been  developed  for  the  analysis  of  the  polymorphic  variants 
in  the  fibrinogen  protein  using  restriction  endonuclease  digestion  with  the  enzymes  Rsa  I 
and  Mnl  I,  respectively,  after  polymerase  chain  reaction  amplification,  providing  detection 
as  restriction  fragment  length  polymorphisms  (Baumann  and  Henschen,  1993). 
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Rare,  Inherited  Variants 

Uncommon,  inherited  variants  of  human  fibrinogen  have  so  far  been  described  as 
observed  only  in  association  with  fibrinogen  dysfunction,  i.e.  incorrect  or  insufficient 
function.  Genetically  abnormal  fibrinogens  have  now  been  detected  in  over  300  families  and 
the  structural  aberrations  identified  in  over  80  of  these  (Henschen  and  McDonagh,  1986; 
Ebert,  1991).  Obviously,  the  abnormal,  dysfunctional  variants  can  be  used  as  highly  specific 
probes  for  structure-function  relationships  in  fibrinogen.  However,  most  genetic  variants  are 
discovered  in  the  hospital  routine  laboratory  when  prolonged  thrombin-clotting  times  are 
noticed,  and  this  results  in  a  selection  of  those  dysfunctional  variants  which  are  related  to 
thrombin  cleavage  and  fibrin  monomer  polymerization.  Out  of  the  over  80  structurally 
elucidated  variants  55  of  the  structural  errors  were  detected  in  the  regions  of  the  fibrinopep- 
tides  A  and  B  and  the  corresponding  thrombin  cleavage  sites,  20  of  the  errors  were  found  in 
the  primary,  complementary  polymerization  site  in  the  carboxyterminal  region  of  the  y  chain 
and  only  6  were  discovered  in  other  parts  of  the  fibrinogen  structure.  It  is  remarkable  that 
only  22  of  the  structural  errors  are  unique  to  a  single  family.  However,  the  mutant  fibrinogen 
genes  are  rare  in  the  population  and  most  individuals  carrying  these  genes  are  therefore 
heterozygous. 


CONCLUSION 

It  may  be  summarized  that  normal  human  fibrinogen  contains  at  least  1 3  variant  sites 
in  the  Aa  chain,  i.e.  those  due  to  alternative  processing,  phosphorylation,  oxidation, 
desamidation,  proteolysis  and  polymorphism,  9  in  the  Bp  chain,  i.e.  those  due  to  hydroxy- 
lation,  oxidation,  desamidation,  glycosylation  and  polymorphism,  and  9  in  the  y  chain,  i.e. 
those  due  to  alternative  processing,  sulfation,  oxidation,  desamidation,  glycosylation  and 
proteolysis.  Each  of  these  31  regional  variants  may  be  symmetrically  or  asymmetrically 
distributed  in  the  fibrinogen  molecules,  but  one  or  two  variants  may  occur  in  genetically 
homozygous  form.  Obviously,  some  variants  are  mutually  exclusive.  It  can  be  estimated  that 
each  individual  would  carry  over  one  million  non-identical  fibrinogen  molecules  in  the 
blood. 
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INTRODUCTION 

The  ER  membrane  is  an  important  organelle  involved  in  such  diverse  functions  as 
protein  translocation,  protein  folding  and  phospholipid  biosynthesis.  It  is  the  entry  point  for 
most  membrane  and  soluble  proteins  into  the  secretory  pathway.  Most  proteins  destined  for 
translocation  into  the  ER  contain  a  signal  sequence  at  their  N-terminus  consisting  of  basic 
amino  acids  followed  by  a  stretch  of  hydrophobic  amino  acids.  The  SRP  (signal-recognition 
particle)  binds  to  the  signal  sequence  of  a  nascent  secretory  protein  that  is  bound  to  a 
ribosome.  The  ribosome-SRP  complex  is  then  bound  to  its  receptor  (DP,  docking  protein)  in 
the  ER  membrane.  After  release  of  SRP,  the  nascent  proteins  are  inserted  into  the  membrane 
or  translocated  into  the  lumen  of  the  ER  (Walter  and  Lingappa,  1986;  Rapoport,  1992)  by 
means  of  a  number  of  membrane  proteins  like  TRAMp  and  the  Sec6  Ip  complex.  In  the  lumen 
of  the  ER,  the  synthesized  polypeptide  may  undergo  ER-specific  cotranslational  and  post- 
translational  modifications  such  as  cleavage  of  the  signal  peptide,  disulfide  bond  formation, 
N-linked  glycosylation,  fatty  acylation,  or  prolyl  hydroxylation.  Thus,  a  multitude  of 
functions  are  carried  out  by  ER  proteins  which  either  integrate  into  the  membrane  or  are 
located  in  the  lumen.  Up  to  now  none  of  these  functions  is  completely  understood  because 
not  all  proteins  involved  in  these  processes  have  been  identified. 

For  these  reasons  we  have  started  a  systematic  analysis  of  the  proteins  located  in 
canine  pancreatic  microsomes  by  using  a  mini-two-dimensional(2-D)  PAGE  technique, 
followed  by  blotting  of  the  separated  peptides  onto  PVDF  membranes  and  automated 
sequencing. 

Here  we  present  first  results  from  the  N-terminal  amino  acid  sequencing  of  twenty- 
one  protein  spots  from  one  up  to  four  Coomassie  Blue  stained  PVDF  membranes.  The  amino 
acid  sequence  comparisons  were  carried  out  using  the  FASTA  computer  program  of  the 
Genetics  Computer  Group.  Eight  protein  spots  could  be  clearly  identified.  From  the  others, 
only  insufficient  sequence  information  could  be  obtained  or  the  N-terminus  was  blocked. 
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The  preceding  fractionation  of  the  microsomes  by  detergent  partitioning  with  Triton  X- 1 1 4 
(Bordier,  1981)  into  lumenal  and  membrane  fractions  provided  additional  information  about 
the  cellular  localization  of  the  polypeptides  identified  and  results  in  an  increased  resolution 
of  the  individual  polypeptide  spots. 


METHODS 

Two-Phase  Partitioning  with  Triton  X-114 

Ribosome-stripped  microsomes  were  prepared  by  standard  methods  (Walter  and 
Blobel,  1983).  This  microsomal  preparation  (about  1  mg  total  protein)  in  200  pi  buffer 
containing  50  mM  HEPES-KOH  (pH  7.8),  500  mM  potassium  acetate,  5  mM  magnesium 
acetate,  5  mM  DTT,  and  protease  inhibitor  (1  :  500)  was  centrifuged  for  10  min  at  70,000 
rpm  at  2°C  in  micro  test  tubes  in  a  TLA  100.3  rotor. 

The  pellet  was  dissolved  by  incubating  for  1 0  min  at  0°C  in  900  pi  of  the  same  buffer, 
except  that  100  pi  20%  Triton  X-114  (TX-114,  from  Sigma)  was  added  (final  concentration 
2%).  After  centrifugation  at  1 4,000  rpm,  insoluble  material  was  removed  and  the  supernatant 
was  divided  into  two  aliquotes.  One  of  them  was  treated  by  ethanol  to  obtain  a  pellet 
containing  all  proteins.  The  other  aliquot  was  subjected  to  phase  separation  for  10  min  at 
37°C  followed  by  centrifugation  for  3  min  at  7000  rpm  in  a  microfuge.  This  step  was  repeated 
twice.  Then  the  membrane  proteins  in  the  detergent  phase  and  the  lumenal  ones  in  the 
aqueous  phase  were  precipitated  by  addition  of  a  fourfold  excess  of  isopropanol  and  ethanol, 
respectively,  at  -20°C  for  48  h.  After  centrifugation  for  10  min  at  14,000  rpm  and  4°C,  the 
pellets  were  washed  with  400  pi  each  of  the  respective  alcohols.  The  detergent  pellet  and 
the  ethanol  precipitate  from  the  aqueous  phase  were  finally  dissolved  in  lEF  sample  buffer 
to  analyze  aliqots  of  them  together  with  the  input  sample  by  mini-2-D  PAGE. 

Two-Dimensional  Electrophoresis 

2-D  PAGE  was  essentially  performed  according  to  Klose  (1989),  and  Jungblut  and 
Seifert  (1990).  Gel  solutions  of  the  first  dimension  were  obtained  ready  for  use  from  the 
Wittmann  Institute  of  Technology  and  Analysis  of  Biomolecules  (WITA),  Technology  Center 
Teltow. 

Glass  tubes  (inner  diameter  1.5  mm,  length  9.3  cm)  were  filled  with  degassed 
separation  gel  solution  consisting  of  9.2  M  urea,  2%  Triton  X-100,  4%  acrylamide,  0.3% 
bisacrylamide,  2%  Servalyt  ampholytes  (four  parts  pH  5-7,  one  part  pH  3-10),  0.14% 
TEMED,  and  0.02%  ammonium  persulfate. 

Electrophoresis  in  the  first  dimension  was  carried  out  under  non-equilibrium  condi¬ 
tions  (NEPHGE,  non-equilibrium  pH  gradient  electrophoresis)  applying  the  samples  at  the 
acidic  end  of  the  gels  (anodic  isoelectric  focusing).  The  cathode  buffer  (lower  chamber) 
consisted  of  9  M  urea,  5%  glycerol  and  5%  ethyl endiamine,  the  anode  buffer  (upper  chamber) 
of  3  M  urea,  4.25%  phosphoric  acid.  Alcohol  precipitates  of  the  total  ER  proteins  and  samples 
from  the  detergent  extraction  procedure  were  dissolved  in  1 0  to  20  pi  buffer  (theoretically 
about  2.5-5  pg  protein/pl)  consisting  of  9.5  M  urea,  2%  Triton  X-100,  5%  p-mercap- 
toethanol,  and  applied  onto  the  Sephadex  gel  on  the  upper  acidic  end  of  the  first  dimension 
gel.  The  samples  were  overlayered  with  overlay  solution.  Gels  were  run  for  3  h  at  500  V. 
The  gels  were  extruded  from  the  tubes,  incubated  for  10  min  in  a  solution  of  Tris-phosphate 
buffer,  glycerol  and  SDS  and  stored  at  -70°C  until  use. 
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SDS  gel  electrophoresis  was  performed  using  a  Mini  Protean  II  cell  from  BioRad. 
Rod  gels  were  placed  onto  the  top  of  10%  Laemmli-gels  (0.75  mm,  6x9  cm),  embedded  in 
1%  agarose/buffer  solution.  The  gels  run  for  1  h  at  30  V  followed  for  2.5  h  at  60  mA. 

Blotting  and  Sequencing 

Immediately  after  2-D  separation,  the  proteins  were  electrophoretically  transferred 
onto  ProBlott^'^  membranes  under  semi-dry  (SD)  conditions  using  a  Trans-Blot  SD  electro¬ 
phoretic  transfer  cell  from  BioRad.  The  transfer  buffer  consisted  of  lOmM  CAPS,  pH  11, 
containing  10%  methanol  and  0.07%  SDS  (only  in  the  cathode  buffer).  Blotting  was 
performed  for  2.5  h  at  constant  current  per  area  of  1  mA/cm^.  The  membranes  were  stained 
with  Coomassie  Blue  for  1  -  2  min,  destained  several  times  in  50%  methanol  and  dried. 

N-terminal  sequencing  was  carried  out  after  direct  loading  of  the  PVDF -blotted 
protein  spots  onto  the  blot-cartridge  of  a  477Apulsed-liquid  phase  sequencer  linked  on-line 
with  an  120 A  phenylthiohydantoin  amino  acid  analyzer  (Applied  Biosystems,  Foster  City, 
CA). 

RESULTS 

To  analyze  proteins  of  the  endoplasmic  reticulum,  we  employed  rough  microsomes 
from  dog  pancreas,  which  can  be  obtained  in  high  purity.  The  ER  proteins  were  further 
separated  into  integral  membrane  proteins  and  soluble  ones  by  using  phase  separation  with 
TX  114  (Bordier,  1981).  Hydrophobic  and  hydrophilic  proteins  are  known  to  partition  into 
the  detergent  phase  and  aqueous  phase,  respectively. 

The  proteins  present  in  both  fractions  were  separated  by  mini-2D  PAGE  (Fig.  lA, 
TX-1 14  supernatant,  and  2  A,  TX-1 14  detergent  phase).  In  the  first  dimension,  the  gels  were 
run  for  only  a  short  time  period  (1500  volt-hours)  toward  the  cathode  site,  as  recommended 
by  0‘Farrell  et  al.  (1975).  This  procedure  permits  the  separation  of  proteins  at  higher  pH 
values  and  is  referred  to  be  a  nonequilibrium  pH  gradient  electrophoresis  (NEPHGE).  Under 
the  conditions  chosen,  even  very  basic  proteins  remained  in  the  gel. 

The  results  show  that  more  proteins  are  found  in  the  fraction  of  soluble  proteins 
especially  on  the  acidic  left  side  of  the  gel  (Figure  lA)  than  in  that  of  membrane  sample. 
The  basic  membrane  proteins  are  separated  as  rather  broad  spots  (Figure  2A).  The  surpris¬ 
ingly  few  spots  of  membrane  proteins  represent,  of  course,  only  the  most  abundant  proteins, 
particularly  because  of  the  limited  amount  of  sample  that  can  be  loaded  without  compromis¬ 
ing  the  separation  (theoretically  50  pg  of  the  alcohol  precipitated  sample  pellets  were 
redissolved  in  the  sample  buffer  of  the  first  dimension  gels).  In  addition  some  proteins  may 
not  be  completely  solubilized  or  failed  to  enter  the  gel  (Hjelmeland,  1990).  Nevertheless 
very  different  polypeptide  maps  were  obtained  from  the  different  fractions  (see  Figures)  and 
the  transfer  of  the  proteins  from  the  gel  onto  the  PVDF  support  is  complete  (data  not  shown). 

As  demonstrated  in  Table  1  twenty-one  proteins  were  subjected  to  N-terminal 
sequence  analysis  by  cutting  out  one  up  to  four  Coomassie  Blue  stained  spots  selected  from 
identical  positions.  Seven  proteins  (spots  number  1,  3,  4,  5,  6,  7  and  16)  could  be  unambi¬ 
guously  identified  by  comparing  determined  sequences  with  these  of  the  data  base,  using 
the  FASTA  computer  program  of  the  Genetics  Computer  Group.  One  protein  with  unambi¬ 
guous  sequence  (spot  number  0)  could  not  be  identified.  The  proteins  of  spot  numbers  2  and 
15  seem  to  be  blocked  at  their  N-terminus  because  no  N-terminal  sequence  could  be  obtained 
despite  the  fact  that  the  intensity  of  staining  was  comparable  to  that  of  other  identified 
proteins  from  this  region.  Because  of  the  limited  amount  of  material,  a  few  spots  (11,  12, 
14,18,19  and  20)  yielded  only  preliminary  sequencing  results  (marked  by  double  parenthe- 
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3  proteiB  disulfide  isomerase  (PDI, 
SSkDa) 

APEEEDDVLVLNKXN 
I  I  I  I  I  I  :  I  I  I  I  :  I  :  1 
. .GAPDEEDHVXVLHKGNF. . 

*  25  30 


4  PDI  family  (49kDa) 

LYXSXDDVIELTPSX 
I  i  :  I  :  I  I  I  I  I  M  I  I  : 

. .SGLYSSSDDVIELTPSMFN. . 
*  25  30 


7  HIP  70  (70kDa) 

SDVLEXTDDNFE 

11111:11:111 

, .AASDVLELTDENFE . 

*  30  35 


16  EFIB  (25kDa)  6  triacylglycerol-acylhydrolase  (52kDa)  5  TRAP  0  (22kDa) 

GFGDLKSPAGLQVLNDYLAD  KEVXYEQIGX 

I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  i  I  I  }  111:11111: 

MGFGDLKSPAGLQVLNDYLADKSY . KAKEVXYEQIGCFSD .  .  . 

*  10  20  *20  30 


Figure  1.  TX-il4  Supernatant  (Lumenal  ER  Proteins). 


Figure  2.  TX-il4  Detergent  Phase  (Integral  ER  Membrane  Proteins).  The  following  refers  to  both  figures  1 
and  2:  In  1 A  and  2 A  are  shown  the  respective  gels,  only  to  illustrate  the  protein  pattern,  gels  are  stained  with 
Coomassie  blue.  After  transfer  (IB  and  2B),  polypeptide  spots  (one  up  to  four  selected  from  several  blots) 
were  cut  out  from  the  ProBlott  paper  for  N-terminal  micro-sequencing.  The  upper  amino  acid  sequences  are 
those  determined  from  polypeptide  spots,  the  lower  ones  represent  that  of  known  proteins  obtained  with  the 
FASTA  computer  program  from  the  data  base.  Amino  acid  residues  are  given  in  one-letter  code.  (X)  means 
that  the  assignment  of  a  phenylthio-hydantoin  (PTH)-amino  acid  was  not  quite  sure.  *  indicated  the  putative 
signal  cleavage  sequence  site. 
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6  triacylglycerol" 
acylhydrolase 


10  VIP36  (36kDa)  5  TRAP  B 

DXTDGNXEXL  EEGARLLASKXLLNRYAVEG 

1  : 1  I  I  I  :  I  :  I  I  I  I  I  I  I  I  I  I  I  :  I  I  I  I  I  I  I  I  I 

.  .VADITDGNSEHLKRE . SHAEEGARLLASKSLLNRYAVEGRDL .  .  . 

*  50  *20  30  40 


Figure  2.  TX-1 14  Detergent  Phase  (Integral  ER  Membrane  Proteins).  See  caption  for  figure  1. 


sis),  which  were  insufficient  to  detect  homologies  to  other  proteins.  Proteins  in  spots  numbers 
S,  9, 13  and  77  could  also  not  be  sequenced.  Interestingly,  spot  number  10  turned  out  to  be 
VIP36,  hitherto  not  reported  to  be  resident  in  the  ER-membrane. 


DISCUSSION 

We  have  initiated  a  systematic  analysis  of  the  proteins  present  in  the  ER  membrane. 
This  organelle  contains  a  high  number  of  both  membrane  bound  and  soluble  proteins,  which 
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Table  1.  N-terminal  sequencing  results  from  2D-blots 


Number  of  the 

PTH  initial 

Spot 

spots  sequenced 

yield 

Protein 

number 

(1,2,3)^ 

(pmol) 

Sequences 

(NBRF-database) 

0 

4(1,3) 

0.5 

DAVVSED (P) G 

? 

1 

1(1) 

3.2 

EEED(K) KEDVG 

heat  shock  protein 
famiIy(HSP  70) 

2 

4(1,3) 

blocked 

3 

3(1,3) 

4.1 

APEEEDDVLVLN (K) XN 

protein  disulfide-isomerase 
(PDI) 

4 

4(1,3) 

2.3 

LY (V) SXDDVIELTPS 

canine  homologous  protein  of 
the  PDI  family 

5 

2  (1,2,3) 

8.3 

EEGARLLASKXLLNRYAVEG 

translocon-associated  protein 
p  (TRAP  P)^ 

6 

4  (1,2,3) 

0.8 

K(E) VXYEQIGX(F) 

triacylglycerol-acyl-hydrolase 

7 

3(1,3) 

2.4 

SDVLELTDDNF (E) 

hormone-induced  protein  70 
(HIP  70)  or  phospholipase  Cal 

8 

2(1,2) 

blocked/to  minor  amount 

9 

2(1,2) 

blocked/to  minor  amount 

10 

2(1,2) 

2.7 

(D) X (T) DGNXEXL 

VIP36 

11 

1(2) 

0.7 

( (NELTQ) ) 

? 

12 

1(2) 

1.9 

( (XXAG(S) XGGNLX) ) 

? 

13 

1(2) 

blocked/to  minor  amount 

14 

1(2) 

2.3 

( (G)XP(G) AX(T) (G) (L) (E) ) 

? 

15 

4(1,3) 

blocked 

16 

4(1,3) 

3.8 

GFGDLKSPAGLQVLNDYLAD 

elongation  factor- 1-P  (EFl  p) 

17 

2(3) 

blocked/to  minor  amount 

18 

3(1,3) 

1.2 

( (P) XGQ (E) AEEG) ) 

? 

19 

4(1,3) 

0.9 

{ (K) EVXFPXXGXX (Y) DD) ) 

? 

20 

2(3) 

1.2 

(  (NX  (R)  T  (G)  NX  (D)  IT)  ) 

7 

®Blots  from  (1),  total  protein;  (2),  TritonX-114  pellet;  (3),  TritonX-114  supernatant. 
^Gorlich  et  al.,  1990;  Hartmann  et  al,  1993. 


are  involved  in  various  functions,  such  as  protein  translocation  (Rapoport,  1 992a;  Rapoport 
et  aL,  1992b,  Jungnickel  et  al.,  1994),  protein  modification  (S.  Hartley  and  A.  Helenius, 
1989),  phospholipid  biosynthesis  and  detoxification  reactions.  The  final  goal  of  this  effort 
is  to  get  a  full  catalog  of  the  ER  resident  proteins,  at  least  of  these,  which  are  most  abundant. 
So  far,  a  similar  endeavor  has  only  been  tried  for  synaptic  vesicles  (Sudhof  et  al,  1993). 
Other  systematic  approaches  have  used  total  cells.  The  prefractionation  of  cells  is  expexted 
to  yield  increased  resolution  and  a  higher  number  of  identificable  proteins.  In  recent  years 
many  efforts  have  been  made  to  obtain  microsequence  information  for  polypeptide  spots 
isolated  directly  from  2D-PAGE  gels  (Vandekerckhove  et  al,  1985;  Aebersold  et  al,  1987; 
Matsudaira,  1987;  Rasmussen  et  al,  1991;  Celis  et  al,  1991;  Baker  et  al,  1992;  Hughes  et 
al,  1992).  With  a  standard  mini-2-D  gel,  a  protein  must  constitute  0.1-1%  of  the  loaded 
mixture  to  be  detectable. 

Of  course,  if  total  cellular  protein  is  applied,  only  a  few  most  abundant  proteins  can 
be  detected.  In  our  approach,  we  not  only  show  the  analysis  with  a  purified  cell  organelle, 
but  also  additionally  separated  proteins  into  hydrophobic  membrane  bound  ones  and  hydro¬ 
philic  soluble  ones. 
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By  visual  inspection  one  up  to  four  spots  from  comparable  positions  of  several  blots 
were  selected  for  structural  characterization  by  automated  microsequence  analysis.  Results 
are  given  in  Table  1 .  In  spite  of  the  limited  quantities  of  a  few  proteins,  identity  of  seven 
proteins  from  a  total  of  twenty-one  analyzed  spots  could  be  established  by  comparison  of 
the  amino  acid  sequences  obtained  with  sequences  available  from  the  data  base.  Surprisingly, 
elongation  factor- 1-p  (spot  number  16)  known  to  be  a  highly  abundant  cytosolic  enzyme 
(Sanders  et  al,  1991)  is  found  in  the  TX-114  supernatant  implying  that  some  cytosolic 
proteins  have  not  been  separated  completely  from  the  microsomal  sample  preparation.  Two 
of  the  proteins  identified,  the  lipases  triacylglycerol-acylhydrolase  (spot  number  6)  and  the 
hormone  induced  protein  HIP-70  (spot  number  7),  respectively,  are  reported  to  be  secretory 
proteins  (Mickel  et  ah,  1989;  Mobbs  et  ah,  1990).  Whether  their  finding  in  our  ER 
supernatant  preparations  reflects  the  presence  of  substantial  amounts  of  these  proteins  in  the 
ER  during  their  way  from  the  cytosol  to  the  cell  exterior  or  whether  they  present  ER  resident 
isoforms  of  these  proteins  remains  an  open  question. 

The  translocon-associated  protein  TRAP  p  (spot  number  5)  and  one  of  the  lipases 
(spot  number  6),  respectively,  are  present  in  both  TX-1 14  supernatant  and  detergent-phase. 
In  the  latter  case  this  behavior  is  probably  due  to  the  function  of  this  enzyme  binding  at  lipid 
interfaces  for  digestion  of  fats.  However,  the  occurrence  of  detectable  amounts  of  TRAP  p 
in  the  TX-1 14  supernatant,  although  TRAP  p  is  proved  to  be  part  of  a  larger  protein  complex 
in  the  ER  memrane  (Gorlich  et  ah,  1990),  is  not  quite  clear.  But  it  seems  to  be  possible,  that 
a  small  proportion  of  TX-114  remained  in  the  aqueous  phase  preventing  the  complete 
separation  of  this  major  component  from  the  other  ones,  particularly  since  TRAP  p  with  only 
a  single  membrane-spanning  region  and  two  attached  carbohydrates  has  also  some  hydro¬ 
philic  character. 

Three  of  the  polypeptides  found,  protein  disulfide-isomerases  (spots  number  3  and 
4)  and  a  heat  shock  protein  (spot  7),  respectively,  were  uniquely  localized  to  the  lumenal 
fraction.  In  fact,  they  belong  to  the  “welcoming  committee”  (Hartley  and  Helenius,  1989) 
of  enzymes  and  factors  thought  to  play  an  important  role  in  the  process  of  protein  folding 
in  vivo,  like  reshuffling  of  disulfides  to  accelerate  a  proper  folding  (Freedman,  1989)  and 
associating  with  polypeptide  intermediates  to  prevent  them  from  aggregation  and  misfold- 
ing,  respectively  (Munro  and  Pelham,  1986). 

Finally,  protein  of  spot  number  10  from  the  detergent  fraction  (Figure  2B)  turned  out 
to  be  authentically  with  the  very  recently  described  VIP36,  a  vesicular  integral  membrane 
protein  (Fiedler  et  ah,  1994).  Interestingly,  Fiedler  et  ah  reported  that  attempts  to  microse¬ 
quence  the  N-terminus  of  the  mature  protein  were  unsuccessful.  Our  found  N-terminal 
sequence  DXTDGNXEXL  (see  Table  1)  is  in  accordance  to  the  putative  signal-sequence 
cleavage  site  deduced  from  the  cDNA  as  suggested  by  Fiedler  et  al. 

VIP36  is  assumed  to  be  involved  in  protein  sorting  between  the  Golgi  and  the  cell 
surface.  Therefore,  Fiedler  et  ah  also  analyzed  the  subcellular  localization  of  VIP36  estab¬ 
lishing  its  occurrence  in  the  Golgi  apparatus,  endosomal  and  vesicular  structures  and  the 
plasma  membrane  by  immunoelectron  microscopy  and  immuno  fluorescence.  The  unex¬ 
pected  finding  of  this  protein  in  our  TX-1 14  detergent  fraction,  suggesting  its  residence  also 
in  the  ER  membrane  remains  to  be  verified. 
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ABSTRACT 

This  paper  summarizes  some  fundamental  statistical  mechanical  aspects  of  protein 
folding  and  the  prediction  of  protein  structure.  In  order  to  predict  the  native  structure  of  a 
protein,  it  is  necessary  to  understand  the  physical  conditions  that  determine  its  unique  and 
thermodynamically-stable  native  structure,  and  to  surmount  the  numerous  local  energy 
minima  to  arrive  at  the  native  structure.  A  statistical  mechanical  approach  has  been  used  to 
address  the  problem  of  foldability  of  polypeptides,  and  global  minimization  techniques  have 
been  developed  to  solve  the  multiple-minima  problem.  Some  recent  progress  in  these  areas, 
made  in  our  laboratory,  is  described. 


INTRODUCTION 

A  problem  of  much  interest  in  protein  chemistry  is  to  determine  how  interatomic 
interactions  dictate  the  folding  of  a  polypeptide  chain  into  the  three-dimensional  structure 
of  the  biologically-active  native  protein.  The  underlying  thermodynamic  theory  for  an 
understanding  of  this  process  has  been  developed  previously  (Scheraga,  1968;  Go  and 
Scheraga,  1969,  1976),  and  subsequent  use  of  lattice  models,  analytical  theories,  and 
computer  simulation  (Taketomi  et  al,  1975;  Dill,  1985;  Dill  et  al,  1989;  Shakhnovich  and 
Finkelstein,  1989;  Shakhnovich  and  Gutin,  1989,  1990,  1993;  Bryngelsen  and  Wolynes, 
1987,  1989;  Skolnick  and  Kolinski,  1990,  1991;  Kolinski  and  Skolnick,  1992;  Covell  and 
Jemigan,  1990;  Honeycutt  and  Thirumalai,  1990;  Comacho  and  Thirumalai,  1993;  Leopold 
et  al,  1992;  Fukugita  et  al,  1993;  Sali  et  al,  1994;  Hao  and  Scheraga,  1994a,b,  1995)  has 
continued  to  contribute  to  a  deeper  understanding  of  the  nature  of  the  problems  involved. 
Two  problems  are  currently  of  central  importance  in  theoretical  studies  of  protein  folding: 
(1)  Determination  of  the  conditions  that  lead  to  the  unique  and  thermodynamically-stable 
native  structure  of  a  protein,  and  (2)  an  accurate  calculation  of  the  native  structure  of  a  protein 
among  its  many  local  minimum- energy  conformations.  The  solution  of  the  first  problem  can 
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greatly  facilitate,  but  not  replace,  the  solution  to  the  second  problem.  In  this  article,  we  review 
the  progress  that  has  been  made  in  our  laboratory  in  solving  these  problems. 


SOME  STATISTICAL  MECHANICAL  ASPECTS 

We  have  recently  presented  three  papers  (Hao  and  Scheraga,  1994a,b,  1995),  desig¬ 
nated  here  as  papers  I,  II  and  III,  dealing  with  some  statistical  thermodynamic  aspects  of 
protein  folding.  Paper  I  deals  with  the  order  of  the  folding  transition  of  a  protein,  paper  II  is 
concerned  with  the  effects  of  polypeptide  sequences  on  their  folding  transitions,  and  paper 
III  addresses  some  aspects  of  the  influence  of  the  potential  function  on  the  folding  of  a  given 
protein.  Even  though  each  of  these  papers  focussed  on  a  different  problem,  they  were  all 
based  on  an  identical  underlying  theme,  i.e.  defining  the  statistical-mechanical  charac¬ 
teristics  of  the  folding  transitions  of  polypeptides  in  terms  of  their  densities  of  states. 

Clearly,  knowing  that  a  given  polypeptide  has  a  unique  and  thermodynamically  stable 
folded  form  is  the  first  requirement  or  condition  for  folding  the  protein  to  its  native  structure 
by  theoretical  methods.  More  generally,  it  is  of  interest  to  know  under  what  conditions,  in 
terms  of  either  the  sequence  or  the  force  fields,  polypeptides  will  have  a  unique  and  stable 
folded  form.  A  statistical  mechanical  formalism  has  been  used  to  answer  these  questions. 
For  practical  applications,  it  is  desirable  to  identify  the  quantities  or  features  that  characterize 
the  foldability  of  polypeptides.  A  sufficient  number  of  case  studies  should  be  carried  out  to 
test  these  criteria.  Our  papers  attacked  these  problems  directly;  therefore,  the  work  is  relevant 
to  the  problem  of  interest  here. 

Much  work  on  protein  folding  (e.g.  Skolnick  and  Kolinski,  1991;  Kolinski  and 
Skolnick,  1992;  Yue  and  Dill,  1994;  Covell  and  Jemigan,  1990)  did  not  address  the  problem 
of  the  foldability  of  a  protein  explicitly,  but  was  concerned  with  a  reasonably  high  probability 
to  achieve  the  folding  of  a  given  protein  to  its  native  (or  targeted)  structure  in  simulations. 
Sali  et  al  (1994)  proposed  that  the  difference  of  the  energies  between  the  lowest  and  second 
lowest-energy  structures  can  be  taken  as  the  criterion  of  foldability  of  a  polypeptide.  This 
proposal,  however,  was  based  on  sampling  only  compact  conformations  of  simple  lattice 
chains,  and  its  general  validity  has  been  questioned  (Bryngelson  et  al,  1 994).  Wolynes’  group 
(Goldstein  et  al,  1 992a,b)  proposed  that  the  ratio  of  the  folding  transition  temperature  to  the 
glass  transition  temperature  of  a  polypeptide  can  be  taken  as  the  criterion  of  foldability  of  a 
polypeptide.  This  criterion  has  a  sound  physical  basis  and  is  elegant.  But,  the  problem  is  that 
the  estimate  of  the  glass  transition  temperature  is  often  highly  approximate,  which  reduced 
the  accuracy  of  the  prediction  by  this  criterion. 

We  have  used  the  complete  density  of  states  to  characterize  the  foldability  of  a  given 
polypeptide.  This  is  a  physically  sound  criterion;  it  covers  the  applicable  ranges  of  the  other 
criteria  and,  more  important,  without  suffering  from  the  inaccuracy  or  approximations  of  the 
previous  theories.  The  density  of  states  determines  the  complete  thermodynamic  properties 
of  the  protein.  However,  due  to  the  complexity  of  the  protein  molecule,  an  analytical 
determination  or  an  exact  enumeration  of  all  conformations  of  a  protein  is  impossible. 
Fortunately,  it  is  the  relative  density  of  states  that  determines  the  thermodynamic  properties 
of  a  protein,  and  the  relative  density  of  states  of  realistic  model  proteins  can  be  determined 
sufficiently  accurately  by  simulation  methods  (see  below).  Figure  1  shows  some  typical 
densities  of  states  of  model  polypeptides  determined  from  our  studies. 

From  the  density  of  states,  many  thermodynamic  properties  of  the  protein,  such  as 
its  average  energy,  heat  capacity,  etc.  can  be  calculated.  Of  particular  interest  is  the  free 
energy  of  the  protein  as  a  function  of  its  energy  (and  temperature),  F{E,T),  which  is 
determined  by  the  relationship: 
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Figure  1.  Illustration  of  different  types  of  densities  of  states  of  polypeptides  obtained  from  Monte  Carlo 
simulations.  S{E)  is  the  entropy  of  states  as  a  function  of  energy  E,  S(E)  is  determined  by  the  logarithm  of  the 
density  of  states  (Hao  and  Scheraga,  1995). 


F(E,T)=E-TS  (1) 

where  E  is  the  energy  and  S  =  k\n[n{E)],  with  n{E)  being  the  density  of  states.  Figure  2 
compares  the  probability  of  occurrence  for  the  conformations  of  two  different  polypeptides 
near  the  transition  temperatures  T^.  It  can  be  seen  that,  while  the  dominant  conformational 
distributions  of  the  two  polypeptides  are  similar  below  or  above  the  transition  temperatures, 
the  two  molecules  exhibit  quite  different  behavior  at  the  transition  temperature.  The  one  on 
the  left  shows  a  bimodel  distribution  and,  therefore,  follows  a  first-order  transition;  the  one 
on  the  right  shows  a  flat  distribution  and  is  expected  to  follow  a  continuous  transition. 

On  the  basis  of  this  approach,  we  have  studied  a  number  of  specific  problems  of  protein 
folding.  Some  of  our  findings  are  (1 )  even  relatively  small  polypeptides  can  undergo  a  first-order 
folding  transition;  (2)  depending  on  the  sequence  and  potential  function,  there  are  three  possible 
types  of  folding  transitions:  first-order,  continuous,  and  glass-like;  (3)  short-range  potentials  and 
long-range  interactions  have  different  effects  on  the  features  of  the  folding  transition  of  polypep¬ 
tides;  and  (4)  we  demonstrate  that  the  density  of  states  is  a  sensitive  indicator  of  the  behavior  of 
protein  folding.  The  characteristics  of  a  good  folding  sequence  under  a  proper  force  field  are 
reflected  in  a  curve  of  the  density  of  states,  which  has  the  character  of  a  first-order  transition  (i.e. 
with  a  concave  segment),  the  folded  states  are  well  separated  from  the  unfolded  states  in  the 
density  of  states,  and  there  are  discrete  states  in  the  lo west-energy  regions. 

To  describe  the  density  of  states  (from  simulations)  quantitatively,  we  have  developed 
an  analytical  formalism  (paper  III).  Our  theory,  following  the  spirit  of  Bryngelson  and  Wolynes 
(1987),  is  based  on  two  components:  (1)  a  mean-field  representation  of  the  energies  of  the 
protein  and  (2)  a  random  distribution  of  energies  in  a  subset  of  conformations  defined  by  the 
fraction  of  native  residues.  The  probability  of  an  average  residue  with  energy  e  is  expressed  as 

1  25V)  I  (2) 


LOG  P(E)  LOG  P(E)  LOG  P(E) 
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Figure  2.  Comparison  of  the  probability  of  occurrence  of  conformations  P{E)  of  two  different  polypeptides 
above  the  transition  temperature  (bottom),  at  the  transition  temperature  (middle),  and  below  the  transition 
temperature  (top)  (Hao  and  Scheraga,  1 994b). 


where  p  is  the  fraction  of  native  residues,  8(p)  and  6(p)  are  defined  by 


e(p)  =  eo  +  £|p  +  S2P^ 

(3) 

6(p)  =  [5o-5,p-52P^]'"^ 

(4) 

Other  properties  of  interest  can  be  derived  on  the  basis  of  the  above  expression.  In 
this  theory,  each  polypeptide  is  defined  by  three  parameters  Sq,  £i  and  82  and  three  distribution 
parameters  Sq,  5]  and  62.  Once  these  parameters  are  determined,  the  folding  behavior  of  the 
polypeptide  is  completely  defined  by  the  theory.  We  have  developed  a  procedure  to  extract 
the  theoretical  parameters  from  the  simulated  density  of  states  for  a  given  polypeptide  (paper 
III).  In  this  form,  the  folding  characteristics  of  a  protein  can  be  defined  quantitatively,  with 
added  insight  from  the  theory  about  the  nature  of  the  interactions  that  determine  such 
characteristics.  It  has  been  shown  that  this  theory  fits  the  simulation  data  very  well.  The  work 
can  be  used  as  a  basis  for  practical  applications  such  as  sequence  design,  the  refinement  of 
force  fields,  and  folding  a  given  protein. 

We  now  briefly  describe  the  simulation  method  for  determining  the  density  of  states 
for  proteins;  this  is  obviously  not  a  trivial  problem.  The  so-called  entropy  sampling  Monte 
Carlo  (ESMC)  method  (Lee,  1993)  played  a  key  role  in  our  simulations  (it  might  be  noted 
that  the  name  ‘ESMC’  is  likely  to  be  changed  in  the  future  on  the  basis  of  arguments  in  paper 
III).  The  essence  of  the  method  is  a  Monte  Carlo  simulation  with  a  probability  distribution 
based  on  a  scaling  function;  this  scaling  function  is  iterated  in  a  series  of  simulations  until 
it  achieves  uniformity  when  it  converges.  The  biased  sampling  technique  that  we  introduced 
into  the  ESMC  method  (Hao  and  Scheraga,  1994a)  is  a  critical  element  that  makes  the  ESMC 
method  work  for  simulating  proteins.  Through  our  experience  in  simulating  proteins  with 
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the  ESMC  method,  we  have  summarized  three  advantages  of  the  ESMC  method  with  respect 
to  the  conventional  Monte  Carlo  method  in  determining  the  density  of  states  of  proteins,  i.e. 
the  ESMC  method  has  a  clearly  defined  criterion  of  convergence,  it  has  a  self  feed-back 
mechanism,  and  it  is  much  less  prone  to  being  trapped  in  a  particular  local-energy  minimum. 
We  also  introduced  a  jump- walking  procedure  into  ESMC  simulations  and  employed  parallel 
programming.  The  experience  gained  in  using  the  ESMC  method  to  simulate  the  density  of 
states  of  model  polypeptides  will  be  useful  in  our  further  efforts  to  fold  realistic  proteins. 

Since  the  native  structure  emerges  as  the  system  is  cooled,  efforts  have  been  devoted 
to  computing  it  accurately  with  empirical  potential  energy  functions.  For  this  purpose,  it  is 
necessary  to  (i)  generate  an  arbitrary  starting  conformation,  (ii)  compute  its  conformational 
energy  (with  entropy  and  hydration  contributions  included),  and  (iii)  locate  the  global 
minimum  of  the  conformational  energy.  Adequate  procedures  are  available  for  steps  (i)  and 
(ii)  and  for  minimizing  the  conformational  energy  (Scheraga,  1992).  The  minimization 
procedure,  however,  leads  only  to  the  local  minimum  closest  to  the  starting  conformation, 
rather  than  to  the  global  minimum;  this  is  the  multiple-minima  problem  (Gibson  and 
Scheraga,  1988;  Scheraga,  1992).  It  is,  therefore,  necessary  to  have  efficient  methods  for 
searching  conformational  space  to  locate  the  lowest  minimum  among  all  those  in  the  whole 
space.  The  next  section  describes  one  of  our  methods  that  has  been  developed  to  facilitate 
such  a  search  of  conformational  space.  Details  of  other  methods  are  discussed  elsewhere 
(Scheraga,  1992;  Vasquez  et  al,  (1994), 


THE  DIFFUSION  EQUATION  METHOD  (DEM) 

The  DEM  is  based  on  the  use  of  the  diffusion  equation  to  deform  the  complex  energy 
hypersurface  in  successive  stages  so  that  higher-energy  minima  disappear,  and  only  a 
descendant  of  the  global  minimum  remains.  A  reversal  of  the  deformation  procedure  then 
recovers  the  global  minimum  of  the  original  potential  function  (Piela  et  al,  1989).  This 
procedure  has  been  applied  to  a  variety  of  simple  mathematical  functions  (Piela  et  al,  1989), 
to  a  series  of  clusters  of  Lennard- Jones  particles  (Kostrowicki  et  al,  1991),  to  water  clusters 
(Wawak  et  al,  1 992),  and  to  terminally-blocked  alanine  and  the  pentapeptide  Met-enkephalin 
(Kostrowicki  and  Scheraga,  1992).  In  the  application  to  Lennard- Jones  particles  (Kos¬ 
trowicki  et  al,  1991),  the  Lennard- Jones  potential  function  was  expressed  as  a  sum  of 
Gaussians,  for  which  an  analytical  solution  of  the  diffusion  equation  is  also  a  sum  of 
Gaussians,  but  with  modified  coefficients.  Calculations  were  carried  out  for  various  cluster 
sizes  n  =  5, 6,7, .,.,55.  For  n  -  55,  there  are  ^10"^^  local  minima,  the  global  minimum  being 
the  MacKay  icosahedron.  This  global  minimum  was  found  by  the  diffusion  equation  method 
(Kostrowicki  et  al,  1991)  in  ~400  seconds  on  an  IBM  3090  supercomputer. 

In  calculations  on  oligopeptides  (Kostrowicki  and  Scheraga,  1 992),  the  DEM  found 
the  global  minimum  for  the  alanine  compound  in  <1  min  and  for  the  pentapeptide  in  ~10 
min,  using  one  processor  of  an  IBM  3090  supercomputer.  Since  the  DEM  scales  as  where 
n  is  the  number  of  residues,  then  it  should  take  (10  min)  (10^),  or  10"^  min  or  ~7  days,  to  scale 
up  by  a  factor  of  1 0,  i.e.  to  go  from  a  pentapeptide  to  a  50-residue  protein,  using  one  processor 
of  the  IBM  3090  computer.  Hence,  using  all  6  processors  of  the  computer,  the  computation 
should  take  ~  1-7/6  days. 

The  DEM  is  illustrated  in  Figure  3  by  a  simple  one-dimensional  function  with  two 
minima.  The  original  function,/ (x),  can  be  deformed,  in  the  first  iteration,  to f^\x)  by  adding 
its  second  derivative, /'X^),  which  is  zero  at  the  inflection  points,  viz. 
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Figure  3.  Original  double-minimum  potential  ener¬ 
gy  curve  f{x)  (solid  line)  transformed,  according  to 
eq.  5,  into  a  curve  with  only  a  single  minimum 
(dashed  line).  The  values  of  the  transformed  function 
at  the  inflection  points  do  not  change.  The  particular 
function  used  in  this  Figure  is  f(x)  =  +  bx^, 

with  a  and  b  equal  to  2  and  0.9,  respectively.  When 
(5  =  0.02,  one  obtains /(x)  +  p  f”{x),  which  exhibits 
only  one  minimum  (Piela  et  al,  1989). 


where  p  is  a  small  positive  constant.  Repeated  applications  of  this  procedure  lead  to  the 
following  result  in  the  iteration: 


1  + 


Ndx^ 


fix) 


(6) 


where  r/N  has  been  written  for  P,  with  the  parameter  t  being  positive.  Destabilization  of  the 
surface  is  most  effective  when  V  — ^  co.  Taking  this  limit,  we  may  write 


F(x,r)  =  lim 

V— >00 


Ll. 

N  dy? 


f(x)  =  exp 


4^ 

dy? 


fix) 


(7) 


It  can  be  shown  that,  equivalently,  ¥(x,t)  is  a  solution  of  the  diffusion  equation 


a^F  d¥ 

dx^  "  dt  (8) 

where  the  parameter  t  takes  on  the  meaning  of  “time”,  with  the  initial  condition  being 
F  (x,0)  =  f{x).  Equations  5-8  and  Figure  3  serve  only  to  show  how  the  DEM  was  originally 
derived  (Piela  et  al,  1989);  in  current  applications,  the  diffusion  equation  is  itself  the  starting 
point  for  the  computations. 

In  higher  dimensions,  (fldx^  is  replaced  by  the  Laplacian,  A  =  d^/dxj,  so  that  the 
diffusion  equation  becomes 


dt  (9) 

The  successive  deformations  of  the  one-dimensional  function  of  Figure  3  from  t  =  0 
=  0.25,  and  the  reversal  from  t  -  0.25  to  t  =  0,  is  illustrated  in  Figure  4.  It  can  be 
seen  how  the  global  minimum  of  the  original  function  is  achieved. 

In  the  diffusion  equation  method,  the  original  potential  surface  is  the  analogue  of  a 
varying  concentration  which  becomes  uniform  as  /  ^  cxd.  Thus,  as  ^  ^  oo,  all  minima  would 
disappear,  and  the  surface  would  become  uniformly  flat.  However,  for  a  sufficiently  large, 
finite  time,  4,  only  one  minimum  (a  descendant  of  the  global  minimum)  remains. 

Up  to  now,  the  DEM  has  been  implemented  in  the  space  of  Cartesian  coordinates  of 
the  atoms,  and  an  analytical  solution  of  the  diffusion  equation  was  obtained  by  symbolic 
evaluation  of  the  Fourier-Poisson  integral,  with  the  potential  function  expressed  as  a  sum  of 
Gaussians  or  cut  Gaussians  and  a  cut  Mr  potential.  When  the  DEM  is  applied  to  chain 
molecules,  it  is  necessary  to  introduce  constraints  to  limit  the  bond  lengths  and  bond  angles 
to  acceptable  values;  i.e.,  although  the  diffusion  equation  is  solved  in  Cartesian  coordinates. 
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Figure  4.  Illustration  of  the  defor¬ 
mation  of  the  original  potential 
f{x)  (the  same  as  in  Fig.  3),  and  of 
the  reversing  procedure.  The  defor¬ 
mation  at  to  =  0.25  leads  to  the  curve 
with  the  unique  minimum  that  is 
achievable  from  any  point  of  the 
space  by  a  simple  minimization. 
Then,  the  reversing  procedure 
(shown  by  the  arrows  directed 
downward)  is  applied  by  consider¬ 
ing  a  sequence  of  the  deformed 
curves  at  t  =  0.15,  0.10,  0.05,  0.02, 
and  finally  0,  where  the  original 
function  is  recovered.  Each  step  of 
the  procedure  is  followed  by  a  mini¬ 
mization  symbolized  in  the  Figure 
by  a  ball  moving  downhill  from  the 
minimum  position  of  the  upper 
curve  and  always  reaching  the  posi¬ 
tion  of  the  minimum  in  the  lower 
curve.  In  the  final  step,  the  global 
minimum  is  found  (Piela  et  al, 
1989). 


its  solution  is  examined  on  the  manifold  of  fixed  bond  lengths  and  bond  angles.  However, 
even  for  a  smoothed  function  having  only  one  minimum  in  the  whole  space,  it  is  still  possible 
to  have  more  than  one  constrained  minimum  for  fixed  bond  lengths  and  bond  angles. 

This  problem  can  be  circumvented  by  solving  the  diffusion  equation  in  internal  rather 
than  Cartesian  coordinates.  However,  when  using  internal  coordinates,  it  has  not  yet  been 
possible  to  obtain  an  analytical  solution.  Instead,  the  energy  function  is  expressed  as  a  Fourier 
series,  and  the  solution  of  the  diffusion  equation  for  such  an  energy  function  is  also  a  Fourier 
series  with  coefficients  related  to  those  of  the  original  function  (Kostrowicki  et  al,  1995).  It 
appears  that  a  good  approximate  solution  of  the  diffusion  equation  can  be  obtained  for  large 
t  by  considering  only  the  lowest-order  Fourier  coefficients.  These  coefficients  are  evaluated 
by  separation  of  the  geometrical  and  physical  (energetical)  part  of  the  integral 
fcos0;t  l^y(0i,...,0„)  i/0i...J  0n.  The  geometrical  part  of  the  integral  is  a  generalization  of  the 
end-to-end  distance  distribution  function,  and  is  derived  iteratively  from  the  chain  geometry 
(Kostrowicki  and  Scheraga,  1994).  This  function  is  then  integrated  numerically  with  the 
distance-dependent  pair  interaction  function  In  the  time  reversal,  the  solution  of  the 

diffusion  equation  at  small  t  (ultimately  to  ^  =  0)  is  expressed  as  an  analytical  solution  of  the 
diffusion  equation  in  the  linear  subspace  tangent  to  the  manifold  of  geometries  with  fixed 
bond  lengths  and  bond  angles.  This  approach  is  currently  undergoing  testing  and  evaluation. 
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INTRODUCTION 

Studies  of  the  structures  and  amino-acid  sequences  of  proteins,  and  the  relationships 
between  structure  and  function,  are  increasing  rapidly  in  terms  of  the  quantity  of  information 
available,  the  number  of  groups  engaged  in  the  field  and  the  areas  in  which  the  resultant 
knowledge  is  being  applied.  An  obvious  impetus  results  from  the  projects  to  determine  the 
entire  genetic  constitution  of  humans  and  other  species,  which  demands  interpretation  in 
terms  of  the  proteins  which  the  chromosomal  nucleic  acid  encodes.  Many  such  proteins  have 
been  characterised  solely  as  the  translations  of  open  reading  frames  of  the  nucleic  acid  code 
and  are  likely  to  be  of  unknown  function  and  3 -dimensional  structure.  It  seems  clear, 
however,  that  protein  molecules  and  their  constituent  domains  belong  to  families  that  have 
evolved  from  a  common  ancestor  and  that  there  may  well  only  be  between  one  and  two 
thousand  such  families.  This  number  should  be  compared  to  the  number  of  proteins  for  which 
sequences  are  so  far  known  (over  80000  in  the  current  release  of  the  OWL  composite, 
non-redundant  database  (Akrigg  et  a/..  1992;  Bleasby  &  Wootton,  1990)  and  the  500  or  so 
different  proteins  whose  3 -dimensional  structures  are  known. 

The  relative  difficulty  of  determining  experimentally  both  the  functions  and  the  3-D 
structures  of  proteins  means  that  there  will  be  an  ever-increasing  number  of  proteins  of 
potential  interest,  for  which  sequence,  but  not  structural,  information  is  available.  While  in 
some  cases  (for  example  the  same  protein  from  different  species,  or  proteins  that  are  very 
closely  related  in  function)  a  family  relationship  is  readily  revealed  by  comparison  of 
sequences,  it  is  clear  that  proteins  belonging  to  the  same  family,  as  evidenced  by  closely 
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similar  chain  folds,  may  have  such  dissimilar  sequences  that  the  relationship  is  obscure. or 
at  least  statistically  tentative.  Nevertheless,  there  may  be  evidence  of  the  relationship  through 
the  occurrence  of  one  or  more  sequence  motifs  that  together  constitute  a  “fingerprint”  or 
“signature”  that  is  characteristic  and  thus  diagnostic  of  family  membership.  An  example  is 
provided  by  the  lipocalin  family  of  proteins.  The  half  dozen  members  of  this  family  for  which 
3 -dimensional  structures  have  been  determined  experimentally  are  found  to  share  a  similar 
chain  topology  the  main  feature  of  which  is  in  8-stranded  p-barrel.  Any  two  members  of  the 
family,  however,  may  well  have  only  about  20%  sequence  identity.  Within  their  sequences, 
however,  are  embedded  three  regions  that  include  a  pattern  of  invariant  or  similar  residues 
which,  once  they  have  been  located,  are  seen  to  be  characteristic  of  the  family.  Such  motifs 
may  have  a  functional  or  a  structural  role. 

Further  properties  of  the  amino-acid  sequences  of  proteins  that  may  be  indicative  of 
family  relationships  include  the  periodicities  of  amino-acid  type  that  are  characteristic  of 
secondary  structure  and  the  patterns  of  hydrophobic  and  hydrophilic  side-chains  that  are 
indicative  of  internal  or  external  environment  or  of  an  intramembranous  location. 

While  a  number  of  automatic  alignment  procedures  have  been  devised,  these  are  in 
general  successful  only  when  sequence  similarity  is  fairly  high.  When  the  similarity  is  lower 
than  about  20%  identity,  with  the  implication  that  there  may  be  insertions  or  deletions  in  one 
or  other  sequence,  automatic  alignment  becomes  problematic.  Manual  alignment  procedures 
may  nevertheless  be  successful  and  we  have  found  that  colour-coding  according  to  amino 
acid  class  may  reveal  patterns  that  the  eye  can  pick  up  even  when  overall  similarity  is  low. 

When  a  3 -dimensional  structure  is  available  for  one  or  more  members  of  a  protein 
family,  alignment  of  sequences  is  greatly  facilitated  by  the  knowledge  that  insertions  and 
deletions  usually  occur  in  surface  loops  and  that  structure-dependent  properties  of  the  aligned 
sequences  should  be  consistent. 

The  above  considerations  demand  computer  software  that  allows  for  the  input  and 
interactive  alignment  of  protein  sequences,  the  pictorial  display  of  3 -dimensional  structure, 
the  graphical  display  of  amino-acid  properties  as  a  function  of  position  within  the  sequence 
and  access  to  databases  of  sequences,  structures  and  motifs  for  the  retrieval  and  deposition 
of  data  as  required.  This  contribution  describes  the  development  of  two  intercalated  data¬ 
bases  and  associated  software  which  make  up  a  storage,  interrogation  and  retrieval  system 
for  the  analysis  of  protein  sequence  and  structure. 

The  increasing  number  of  scientists  in  academic  and  industrial  laboratories,  whose 
research  would  be  facilitated  by  the  software  include  many  whose  use  of  computers  is  only 
occasional  and  who  therefore  require  software  that  is  easy  and  straightforward.  The  use  of 
windows-based  and  menu-driven  software  has  now  become  widespread  in  place  of  com¬ 
mand-line  driven  programs  and  this  has  governed  our  approach  in  the  development  of  the 
VISTAS  suite  described  in  this  paper.  The  software  has  been  designed  on  a  modular  basis 
and  in  such  a  way  that  it  can  be  implemented  on  a  variety  of  hardware  platforms.  The 
illustrations  in  this  paper  are  based  on  the  Silicon  Graphics  Iris  implementation,  but  it  is  also 
operational  on  Digital  Alpha  AXP  range  and  it  can  readily  be  ported  to  other  Unix-based 
systems. 


DATABASES 

At  the  core  of  our  system  are  two  data-storage  elements  which  together  with 
associated  software  can  be  accessed  at  the  UK  EMBnet  Node  at  Daresbury  (SEQNET),  the 
anonymous  FTP  address  for  which  is  s-ind2.dl.sc.UK.  The  first  is  a  non-redundant  composite 
protein  sequence  database  (OWL)  which  at  its  last  update  contained  nearly  80,000  entries, 
drawn  from  SWISS-PROT  (Bairoch  &  Boeckman,  1991),  NBRF-PIR 1  (George  et  al,  1 986), 
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NBRF-PIR  2,  -PIR  3,  NRL-3D  (Namboodiri  et  al  1989)  and  GenBank  (translation)  (Burks 
et  al  1989;  Pickett,  1986).  The  source  databases  are  assigned  a  priority  with  respect  to 
sequence  validation  and  their  contents  are  amalgamated.  Redundant/trivially  different  en¬ 
tries  are  eliminated  according  to  defined  criteria  using  the  COMPO  suite  of  programs 
(Bleasby  &  Wootton,  1990).  OWL  is  in  the  NBRF  format  for  compatibility  with  established 
search  software  and  is  interrogated  using  the  query  language  DELPHOS  to  allow  retrieval 
of  sequence  and  textual  material  in  the  database. 

Other  modules  in  the  system  include  SWEEP  which  incorporates  best-local  and 
complete  sequence  alignment  algorithms  based  on  the  approach  of  Lipman  &  Pearson  (1985) 
and  which  allow  database  searches  with  complete  sequences.  ADSP  (Parry-Smith  & 
Attwood,  1991)  is  an  associated  package  which  permits  multiple  sequence  alignment  and 
manipulation,  local  similarity  detection  and  the  development  of  discriminating  sequence- 
based  features.  ADSP  was  designed  to  permit  rigorous,  iterative  development  of  pattern-rec¬ 
ognition  discriminators  which  are  diagnostic  of  the  structural  and  functional  characteristics 
of  proteins  or  protein  domains.  These  can  then  be  compiled  into  a  new  second-generation 
biological  database  (PRINTS)  containing  the  discriminators  along  with  all  the  information 
(references,  scan  histories,  etc.)  and  commentaries  relevant  to  each  PRINT  entry.  This  new 
database  currently  contains  250  entries  almost  all  of  which  encompass  multiple  discrimina¬ 
tors  (2000),  i.e.  the  set  of  sequence  motifs  characteristic  of  the  protein.  The  information  is 
rapidly  addressed  and  retrieved  by  the  query  language  SMITE  (Bleasby,  unpublished)  which 
shares  a  common  syntax  with  DELPHOS.  The  database  differs  from  PRO  SITE  in  that  protein 
families  are  usuall  characaterized  by  more  than  one  motif 

The  discovery  of  multiple  discriminators  and  the  development  of  an  associated  search 
system  arose  from  the  study  of  membrane-bound  G-protein  linked  receptors.  Even  though 
sequence  identity  is  well  below  the  statistically  significant  level  in  these  proteins,  it  is  clear 
that  the  substitution  patterns  for  particular  positions  and  regions  in  the  various  sequences 
fall  within  fairly  strictly  defined  limits.  Thus  it  is  possible,  based  on  the  primary  structure 
data  alone,  to  compile  a  series  of  7  discriminators  each  describing  a  transmembrane  segment 
(Attwood  &  Findlay,  1994).  This  analysis  reveals  that  each  of  the  transmembrane  segments 
have  their  own  special  identity  which  clearly  indicates  features  of  structural  importance 
within  the  super-family  of  receptors.  There  are  within  this  overall  class  of  receptor,  sub-fami¬ 
lies  which  have  their  own  distinctive  elements.  Furthermore,  there  are  7-transmembrane 
proteins  which  form  their  own  special  sub-groups  related  or  completely  unrelated  to  the 
G-protein  linked  receptors,  e.g.  the  secretin  sub-family. 

The  database  now  contains  a  large  number  of  protein  families  and  domains  which 
possess  multiple  discriminators.  In  many  cases  individual  motifs  had  been  identified  pre¬ 
viously.  These  discriminators  describe  areas  which  may  be  of  either  structural  or  functional 
significance.  In  the  case  of  the  lipocalins  mentioned  earlier,  we  have  discovered  other 
members  of  the  family  such  as  a  protein  known  hitherto  as  the  “mouse  24p3  oncogene 
product”,  to  which  we  are  now  able  to  ascribe  both  membership  of  a  structural  family  and  a 
putative  function  (Flower  et  ai,  1991,  1993). 

All  this  information  has  been  amalgamated  into  a  single  interactive  piece  of  software 
VISTAS  which  is  described  below. 


OPERATION  AND  DESIGN  OF  THE  VISTAS  PROGRAM 

The  principal  facilities  of  the  program  are  menu-driven,  with  selection  of  and  from 
the  menus  being  through  use  of  the  mouse  buttons.  The  only  occasions  when  keyboard  entries 
are  required  are  for  the  initial  selection  of  data  and  parameter  files  and  for  communication 
with  external  programs  for  the  retrieval  and  storage  of  data. 
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The  two  main  types  of  data  that  VISTAS  must  manipulate  are  those  related  to 
structural  and  sequence  information.  Data  for  the  pictorial  display  of  3-D  structure  are 
maintained  in  a  pre-defmed  3-D  array  corresponding  to  the  scaled  x,  y  and  z  co-ordinates  of 
the  atoms.  As  the  data  required  to  draw  van  der  Waals  and  full-bond  representations  are 
potentially  very  memory-consuming,  a  linked  list  is  used,  with  each  structure  being  assigned 
when  needed  by  using  the  dynamic  memory  allocation  facilities  of  the  C  programming 
language. 

Sequence  data  are  stored  in  pre-defmed  structures  rather  than  as  linked  lists.  While 
this  may  be  wasteful  of  memory  and  potentially  limit  the  length  and  number  of  sequences 
that  may  be  manipulated,  it  allows  the  program  very  quickly  to  locate  selected  sequences 
and  residues  within  them  without  having  to  move  along  a  linked  list. 


SOURCE  DATA  FORMATS  AND  DEFAULT  PARAMETERS 

Sequence  data  are  handled  in  the  standard  NBRF/PIR  format  and  may  be  read  in  as 
single  sequences  or  as  aligned  sets  of  sequences.  Coordinates  of  3-D  structures  are  read  in 
standard  PDB  format.  The  program  can  handle  protein  molecules  with  one  or  more  chain 
and  can  also  represent  ligands.  Default  files  are  available  for  colouring  residue  types  on  the 
screen  and  on  postscript  output.  As  default,  the  program  uses  the  colours  we  have  previously 
adopted  for  classifying  residues,  viz.: 

•  grey:  alkyl  chains  (Ala,  Val,  He,  Leu,  Met) 

•  purple:  aromatic  residues  (Phe,  Tyr,  Trp) 

•  red:  acidic  residues  (Asp,  Glu) 

•  blue:  basic  residues  (Arg,  His,  Lys) 

•  green:  polar  uncharged  residues  (Ser,  Thr,  Asn,  Gin) 

•  yellow:  sulphydryl  and  disulphide  residues  (CysH,  Cys) 

•  brown:  conformational  exceptions  (Gly,  Pro) 

Users  may  specify  any  number  of  alternative  colour  schemes  if  they  wish.  Other 
default  parameters  for  which  alternatives  can  be  specified  include:  colours  to  be  used  to 
indicate  variations  in  properties  such  as  hydrophobicity;  substitution  matrix  values;  secon¬ 
dary  structure  propensity  data  etc. 


THE  SEQUENCE  WINDOW 

This  window  allows  the  display  and  manipulation  of  up  to  20  sequences  at  a  time  out 
of  up  to  500  stored  in  the  computer.  The  sequences  are  colour-coded,  initially  according  to 
side-chain  property.  The  sequences  may  be  scrolled  vertically  or  horizontally  by  use  of  the 
1  and  r  keys. 

The  mouse  cursor  may  be  used  to  mark  positions  and  control  insertions  or  deletions, 
thus  allowing  sequences  to  be  edited  and  aligned.  The  colour  coding  is  found  to  be  valuable 
in  achieving  alignments,  which  frequently  depend  upon  the  recognition  of  similarities  of 
property  rather  than  of  identities. 

Sequences  may  be  grouped  so  that  insertions  or  deletions  are  incorporated  similarly 
in  all  members  of  the  group,  and  anchor  points  may  be  defined  to  allow  insertions  or  deletions 
to  be  made  while  leaving  established  parts  of  the  alignment  intact.  The  order  in  which 
sequences  appear  in  the  window  may  be  changed  to  facilitate  comparison.  New  sequences 
may  be  added  to  the  alignment,  unwanted  ones  deleted  and  alignments  stored  for  future  use. 
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Sequence  motifs  may  be  defined  by  use  of  the  mouse  and  then  stored,  with  their 
locations  shown  also  on  the  structure  and  graph  window.  The  selected  motifs  may  also  be 
used  to  scan  the  OWL  database  to  check  for  their  occurrence  in  other  proteins. 

Integration  of  the  scanning  procedures  with  VISTAS  yields  two  major  benefits:  first, 
any  additional  protein  sequences  that  are  identified  as  being  of  interest  may  immediately  be 
displayed  in  the  sequence  window  and  analysed;  second,  newly  identified  motifs  may  be  fed 
into  the  PRINTS  definition  and  refinement  modules,  thereby  updating  the  PRINTS  database. 
The  “Find  motif’  menu  command  allows  the  user  to  type  in  a  short  motif  sequence;  VISTAS 
then  carries  out  a  fuzzy  search  for  the  occurrence  of  the  motif  within  the  displayed  sequences. 

Automatic  alignment  procedures  may  be  invoked  as  an  alternative  to  manual  align¬ 
ment  through  interfaces  to  appropriate  programs,  including  CLUSTAL,  and  the  global 
sequence  searching  programs  FASTA  and  SWEEP  may  be  used  to  extract  and  display 
sequences  related  to  those  previously  selected. 

Navigation  through  the  sequences  is  facilitated  by  a  ruler  which  may  be  set  to  indicate 
the  residue  numbers  corresponding  to  any  one  of  the  individual  sequences  displayed  or  to 
the  alignment  as  a  whole. 


THE  STRUCTURE  WINDOW 

This  window  allows  the  display  of  the  3-D  structure  of  the  protein  corresponding  to 
one  of  the  sequences,  pre-selected  on  entry  to  VISTAS.  In  designing  VISTAS,  no  attempt 
was  made  to  emulate  the  many  comprehensive  modelling  packages  that  are  now  widely 
available,  but  a  number  of  basic  manipulative  features  are  provided:  they  include  scaling, 
clipping  ,  translation  and  rotation.  Alternative  styles  of  display  include  Ca  trace,  skeletal, 
Ca  ball-and-spoke,  all  atoms  ball-and-spoke,  space-filling  and  van  der  Waals  atoms.  Protein 
ligands  may  be  displayed  and  manipulated  independently  from  the  protein  itself.  Ligand 
atoms  may  be  depicted  with  double  van  der  Waals  radii,  a  feature  that  is  useful  in  searching 
for  close  contacts  with  the  protein,  which  is  represented  in  skeletal  form.  This  is  useful  for 
pinpointing  the  residues  involved  with  ligand  interactions. 

The  structures  can  be  coloured  by  residue  type  (in  conformity  with  the  sequence 
window)  or,  as  described  below,  with  selected  motifs  or  specified  properties.  The  matrix 
corresponding  to  a  chosen  view  of  the  molecule  can  be  stored  for  later  use. 


THE  GRAPHICS  WINDOW 

This  window,  which  is  invoked  optionally,  allows  the  display  of  a  variety  of 
properties  as  a  function  of  position  in  the  sequence.  The  properties  include: 

1 .  Positional  variability,  i.e.  the  number  of  different  types  of  amino  acid  that  occur 
at  a  particular  position  in  an  aligned  set  of  sequences.  This  may  be  a  simple  count 
of  different  residue  types  or  it  may  be  a  weighted  sum,  with  weights  derived  from 
a  specified  substitution  matrix  {e.g.  that  of  Risler  et  al  (1988)). 

2.  Secondary  structure  propensity,  as  determined  by  the  Gamier,  Osguthorpe  & 
Robson  (1978)  algorithm.  The  graphical  display  shows  the  computed  propensities 
for  the  four  canonical  conformations  (turn,  coil,  a  helix,  p  strand)  and  the 
sequence  alignment  and  structure  may  be  coloured  accordingly.  Although  secon¬ 
dary  structure  prediction  remains  notoriously  unreliable,  the  use  of  an  alignment 
allows  the  joint  propensities  to  be  evaluated,  which  improves  prediction,  and. 


470 


J.  B.  C.  Findlay  et  al. 


taken  together  with  the  known  structure  of  at  least  one  member  of  a  family,  is  an 
aid  to  homology  modelling. 

3.  Hydropathy,  as  determined  by  one  of  three  methods,  those  of  Kyte  &  Doolittle 
(1982),  Sweet  Sl  Eisenberg  (1983)  or  Engelman  etal  (1986).  Again,  the  calcula¬ 
tion  can  be  derived  from  a  single  sequence  or  from  the  aligned  set,  and  the 
sequences  and  structure  may  be  coloured  to  show  the  calculated  value. 

4.  Solvent  accessible  area  parameter,  using  the  scale  derived  by  Rose  et  al.  (1985) 
by  calculating  the  mean  solvent  accessible  area  of  each  residue  type  in  23  proteins 
of  known  structure,  and  evaluated  with  a  window  length  of  five  residues. 

5.  Flexibility,  using  a  scale  derived  by  Ragone  et  al.  (1989)  of  the  mobility  of 
residues  (as  evidenced  by  thermal  parameter,  B)  in  proteins  of  known  structure, 
evaluated  with  a  window  length  of  ten  residues. 

Clearly,  the  properties  hydropathy,  solvent-accessible  area  and  flexibility  are  closely 
correlated,  but  each  different  aspect  of  amino-acid  properties  is  an  aid  to  sequence  alignment 
and  the  detection  of  homology. 


INTEGRATION  OF  WINDOWS 

As  indicated  in  preceding  sections,  the  sequence  and  structure  displays  are  both 
initially  coloured  by  residue  type;  when  a  graph  of  properties  in  invoked,  sequence  and 
structure  displays  will  normally  become  coloured  to  represent  the  variation  in  those  proper¬ 
ties.  It  is,  however,  straightforward  to  use  different  colour  schemes  in  all  three  windows  - 
for  example,  the  sequence  could  be  coloured  according  to  residue  type,  and  the  structure 
according  to  residue  variability  while  the  graph  displayed  hydrophobicity. 

Further  features  of  the  program  include  the  ability  to  point  to  a  residue,  or  a  sequence 
of  residues,  in  any  of  the  windows  and  it  will  then  be  high-lighted  in  the  other  windows. 
Thus,  an  invariant  residue  (colour-coded  as  having  zero  variability  in  the  graph  window) 
may  be  located  in  the  structure  window,  or  a  region  shown  in  the  graph  window  as  expected 
to  be  very  flexible  can  be  located  in  the  sequence  and  structure  windows.  Motifs  selected  in 
any  of  the  windows  will  be  high-lighted  on  the  others  and  their  sequences  may  be  written  to 
a  file. 


INTERFACES  TO  OTHER  PROGRAMS 

Reference  has  already  been  made  to  the  ways  in  which  the  display  features  of  VISTAS 
interact  with  other  procedures.  These  fall  into  two  categories.  First,  the  PRINTS  database 
scanning  module  is  an  integral  part  of  VISTAS.  Motifs  selected  by  the  user  of  VISTAS  can 
be  submitted  directly  to  a  motif  database-scanning  routine  with  similar  functionality  to  that 
of  ADSP  (Parry-Smith  &  Attwood,  1991),  which  adopts  an  iterative  procedure  to  refine  a 
matrix  describing  the  motifs  in  terms  of  the  frequency  of  occurrence  of  the  20  types  of  amino 
acid  at  each  position  of  each  motif.  At  each  stage  of  the  refinement,  a  hit-list  is  generated  of 
the  protein  sequences  that  include  the  motifs.  The  amino  acids  occurring  in  any  newly-iden- 
tified  family  members  are  then  incorporated  to  modify  the  matrix  appropriately.  The  SCAN 
and  COMPARE  algorithms  of  VISTAS  allow  the  user  to  search  the  existing  entries  in  the 
PRINTS  database  for  any  potential  motif  identified  by  inspection  of  the  sequence,  structure 
and  graphics  windows  and  then  to  update  existing  entries,  or  create  new  entries,  in  PRINTS 
by  invoking  the  scanning  and  analysis  procedures. 


Protein  Sequence  Analysis,  Storage  and  Retrieval 


471 


Second,  the  DELPHOS  software  used  to  scan  the  OWL  database  may  be  called 
directly  from  VISTAS  in  a  separate  window;  DELPHOS  allows  OWL  to  be  scanned,  either 
by  obligatory  or  fuzzy  searches,  for  the  occurrence  of  whole  or  partial  sequences,  bibliog¬ 
raphic  or  textual  information  in  the  databases.  Boolean  procedures  allow  the  combination 
of  queries  -  for  example,  for  the  occurrence  of  the  sequence  “A*CDEF”  in  a  protein  from 
“Homo  sapiens”  in  a  paper  by  “Smith  &  Jones”. 

Third,  it  is  possible  for  VISTAS  to  initiate  as  background  jobs  external  programs 
such  as  the  PASTA  and  SWEEP  global  sequence  searching  programs. 


THE  ALIGN  PROGRAM 

ALIGN  is  essentially  a  version  of  VISTAS  that  is  designed  to  offer  the  sequence  and 
graphical  windows  of  VISTAS  but  without  the  structure  display  facilities.  It  is  aimed  at 
studies  of  sequences  and  sequence  alignments  in  cases  where  no  3 -dimensional  structure 
information  is  available.  The  omission  of  the  structure  display  window  allows  the  interactive 
display  of  a  much  larger  number  of  sequences. 


COMPARISON  WITH  OTHER  SOFTWARE  WITH  RELATED 
FUNCTIONALITY 

There  exist  several  other  systems  that  offer  some  of  these  facilities,  but  none,  to  our 
knowledge,  provide  the  combination  provided  by  VISTAS  for  an  integrated  and  user-friendly 
analysis  of  protein  sequences,  structure  and  properties. 
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THE  ORIGINAL  SUPERFAMILY  CONCEPT 

In  the  mid  1970’s,  Dayhoff  proposed  that  all  naturally  occurring  proteins  would 
cluster  into  families  and  superfamilies  whose  members  have  diverged  from  common 
ancestral  forms  (Dayhoff  et  al.,  1975;  Dayhoff,  1976).  A  similar  proposal  was  made 
by  Emil  Zuckerkandl  (1975).  Estimates  of  the  number  of  protein  superfamilies  were 
in  the  low  thousands.  Recently  this  estimate  has  been  reassessed.  Using  a  variety  of 
criteria  for  superfamily  membership,  estimates  of  the  same  order  of  magnitude  as  the 
original  Dayhoff  estimate  have  been  reported  (Green  et  al.,  1993;  Gonnet  et  al.,  1992; 
Chothia,  1992). 

Although  superfamily  relationships  in  some  cases  would  be  so  ancient  as  to 
preclude  recognition  solely  on  the  basis  of  sequence  similarity,  our  group  used  sequence 
similarity  as  the  main  criterion  for  partitioning  the  Protein  Sequence  Database  into 
independent,  nonoverlapping  groups.  The  nearly  500  completely  sequenced  proteins  then 
known  were  each  assigned  to  one  of  116  superfamilies  (Dayhoff  et  al.,  1976).  At  that 
time,  there  were  no  examples  in  the  database  of  complete  precursor  sequences,  of 
polyproteins,  or  of  products  of  alternative  splicing  of  mRNA.  Most  of  the  known  se¬ 
quences  were  of  mature  forms  of  peptides  or  proteins.  The  longest  completely  sequenced 
proteins  were  500-600  residues  (serum  album,  prothrombin,  immunoglobulin  mu  heavy 
chains,  glutamate  dehydrogenase).  There  were  a  few  examples  of  the  existence  of  sequence 
regions  (domains)  in  some  members  of  a  superfamily  but  not  in  others.  Prothrombin 
clearly  had  a  large  amino-terminal  extension  with  respect  to  trypsinogen  and  other  known 
members  of  this  superfamily  and  that  unique  region  showed  evidence  of  internal  dupli¬ 
cation.  There  were  also  several  examples  of  members  of  a  superfamily  containing  variable 
numbers  of  such  related  sequence  segments,  which  we  called  “homology  regions.” 
Pseudomonas  rubredoxin  contained  two  domains  homologous  to  the  other  known  rubre- 
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doxins,  with  an  unrelated  segment  in  between.  Immunoglobulin  C  regions  contained  from 
one  to  four  homologous  domains;  parvalbumin  contained  two  domains  that  were  clearly 
related  to  the  four  domains  of  troponin  C  and  myosin  light  chains;  and  the  known 
apolipoproteins  contained  from  four  to  18  repeats  of  an  11 -residue  pattern  (Barker  & 
Dayhoff,  1977). 

The  superfamily  classification  scheme  provided  an  effective  architecture  for  the 
intercomparison,  correlation,  and  analysis  of  the  nonsequence  data  associated  with  se¬ 
quences  within  any  particular  homology  class.  Members  of  a  superfamily  were  partitioned 
into  closely  related  groups  (families)  of  proteins,  usually  homologs  in  various  species  or 
products  of  more  recent  gene  duplications;  these  could  reasonably  be  expected  to  share  many 
structural  and  functional  characteristics.  Indeed,  much  of  the  biological  information  con¬ 
cerning  protein  sequences  reported  in  the  published  literature  has  been  inferred  by  homology 
with  closely  related  sequences.  These  observations  have  recently  been  confirmed  by  a 
systematic  study  (Sander  &  Schneider,  1991). 

Homology  provides  a  sound  basis  for  the  verification  of  information  in  the  Protein 
Sequence  Database  and  for  induction  of  new  knowledge  concerning  these  data.  Given  that 
there  is  justification  for  such  inference  among  members  of  homology  classes,  new  sequences 
can  directly  inherit  annotation  information  associated  with  existing  homologous  sequences. 
This  provides  a  mechanism  for  comprehensive  and  consistent  annotating  throughout  the 
database.  Moreover,  as  new  experimental  information  becomes  available,  it  can  be  applied 
uniformly  to  entire  classes  of  homologous  sequences. 


PROTEIN  DOMAINS:  LIMITATIONS  OF  THE  ORIGINAL 
SUPERFAMILY  CONCEPT 

Within  a  few  years  of  the  introduction  of  the  superfamily  concept,  it  became  evident 
that  many  protein  sequences  contained  regions  of  local  similarity  with  otherwise  unrelated 
proteins.  In  many  cases  such  domains  were  clearly  responsible  for  the  similar  properties 
(such  as  calcium  binding,  DNA  binding,  or  catalytic  activity)  shared  by  diverse  proteins.  In 
other  cases  the  properties  associated  with  a  particular  domain  remained  to  be  discovered. 
Evidence  from  X-ray  crystallography  and  chemical  studies  revealed  that  these  often  corre¬ 
sponded  with  compact  regions  of  the  structure  or  with  easily  cleaved  fragments.  More 
surprising  was  the  discovery  that  the  genes  for  many  proteins  contain  noncoding  regions 
(introns)  that  divide  the  protein  coding  region  into  exons  that  sometimes,  but  not  always, 
approximately  correspond  with  the  domains  as  defined  by  structure  or  protein  sequence. 
“Exon  shuffling”  among  genes  is  now  recognized  as  an  important  mechanism  in  the 
evolution  of  “new”  proteins. 

In  the  literature,  terms  such  as  “the  immunoglobulin  superfamily”  came  to  mean  the 
collection  of  all  proteins  that  contain  one  or  more  immunoglobulin-related  domains 
(Hunkapiller  &  Hood,  1986;  Williams  &  Barclay,  1988).  Initially  the  members  that  were  not 
classical  immunoglobulins  (beta-2  microglobulin,  T-cell  receptor  chains,  poly-Ig  receptor) 
contained  only  various  numbers  of  immunoglobulin-like  domains,  but  later  such  domains 
were  found  associated  with  various  other  domains  including  protein  kinase  and  protein-ty- 
rosine-kinase  domains.  As  the  term  “superfamily”  is  commonly  used,  these  and  other 
multidomain  proteins  would  be  placed  into  (at  least)  two  superfamilies.  In  such  a  usage,  the 
superfamily  concept  fails  to  partition  the  sequence  data.  For  the  general  scientific  public, 
this  may  be  of  little  concern,  but  for  those  trying  to  organize  sequence  data  it  creates 
dilemmas. 
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PROTEIN  DOMAINS  AND  EVOLUTIONARY  STUDIES 

Because  different  domains  in  proteins  may  have  different  evolutionary  origins  and 
histories,  a  complete  and  accurate  understanding  of  the  evolution  of  a  given  protein  may 
require  treating  each  domain  of  the  protein  as  a  separate  entity.  Only  when  it  can  be  shown 
that  the  evolutionary  history  of  each  domain  is  “congruent”  (Nakayama  et  al.,  1992)  with 
the  history  of  the  entire  molecule  is  one  justified  in  constructing  a  phytogeny  encompassing 
the  entire  molecule.  Unfortunately,  many  domains  are  rather  small  so  that  derived  topologies 
are  often  based  on  very  few  useful  and  unambiguously  informative  sites.  This  principle  is 
well  illustrated  by  the  extensive  studies  of  the  evolution  of  proteins  containing  EF-hand 
domains  (Moncrief  et  ah,  1990;  Nakayama  et  al.,  1992).  These  proteins,  which  contain  from 
two  to  eight  repeats  of  the  domain,  have  been  clustered  by  these  workers  into  29  types.  Of 
ten  types  that  each  contain  four  repeats  of  the  domain,  eight  were  judged  to  have  evolved 
from  a  common  four-domain  ancestor,  which  itself  was  formed  by  duplication  of  a  two-do- 
main  precursor.  These  include  the  well-known  calmodulin,  troponin  C,  myosin  essential  light 
chain,  and  myosin  regulatory  light  chain,  as  well  as  caltractin  (Chlamydomonas)  and  CDC3 1 
(Saccharomyces),  squidulin  from  Loligo,  call  from  Caenorhabditis,  and  calcium-dependent 
protein  kinase  from  soybean.  Two  other  types,  calpain  and  sarcoplasmic  calcium-binding 
protein,  have  each  evolved  independently  into  the  four-domain  form  from  a  single-domain 
ancestor.  It  would  be  misleading  to  derive  an  evolutionary  model  based  on  the  entire 
sequences  (four  domains)  that  includes  these  noncongruent  forms. 


REVISED  SUPERFAMILY  CONCEPT 

Because  of  the  fundamental  role  played  by  the  superfamily  concept  in  organizing  the 
Protein  Sequence  Database,  we  have  recently  developed  a  formal  model  that  encompasses  most 
common  usages  of  the  term  superfamily  and  provides  an  architecture  for  partitioning  the 
database  into  domain  superfamilies  based  on  sequence  homology.  This  architecture  provides  a 
mechanism  for  systematic  analysis  and  refinement  of  information  induced  by  homology. 

In  this  model,  the  concepts  of  superfamily  and  family  are  generalized  to  encompass 
any  scheme  for  classifying  proteins  (or  regions  within  proteins)  that  partitions  the  proteins 
(or  protein  regions)  into  hierarchically  nested  sets  that  are  closed  under  transitivity,  i.e.,  if 
members  A  and  B  are  in  the  same  set  and  B  and  C  are  in  the  same  set,  then  A  and  C  are  also 
in  the  same  set.  Formally,  a  superfamily  is  a  union  over  families.  Families  are  sets  within 
the  superfamily  hierarchy  for  which  the  members  meet  a  threshold  level  of  relatedness.  The 
threshold  concept  of  families  is  based  on  empirical  evidence  that  a  threshold  can  be 
established  whereby  closely  related  proteins  can  be  inferred  to  share  common  biological 
properties.  Beyond  this  threshold  more  distantly  related  forms  may  diverge  in  these  proper¬ 
ties  (although  clear  evidence  of  their  relatedness  remains). 

We  have  applied  this  model  to  establish  a  classification  of  protein  sequence  homology 
domains.  We  define  a  homology  domain  as  a  subsequence  that  is  related  by  common 
evolutionary  ancestry  to  other  sequence  domains  of  the  same  homology  class.  Domains  may 
encompass  the  entire  protein  sequence,  in  which  case  they  are  denoted  as  “homeomorphic” 
domains.  Note  that  domains  need  not  be  contiguous  and  that  the  classification  is  based  on 
the  conceptually  complete  protein  sequence,  i.e.,  when  only  fragmentary  data  are  available, 
these  data  may  be  classified  provided  that  there  is  sufficient  data  available  to  allow  the 
assumption  that  the  missing  data  conform  with  the  established  relationships.  This  provides 
a  natural  mechanism  for  handling  domains  broken  by  intervening  “loops”  and  for  classifying 
fragmentary  data.  In  particular,  we  consider  conceptually  complete  homeomorphic  domains 


476 


W.  C.  Barker  et  al. 


Homeomorph 
Domain  AJBJC  (complex) 


Domain  AJB  (complex) 


Domain  C  (simple) 


Domain  A  (simple) 


Domain  B  (simple) 


Figure  1.  The  homeomorphic  domain  corresponds  with  the  complete  sequence  and  may  contain  overlapping 
complex  and  simple  domains. 


to  be  those  that  can  be  expected  to  comprise  the  entire  protein  sequence  even  though  the 
sequence  data  may  be  fragmentary  and  therefore  such  homology  cannot  be  established 
directly.  Moreover,  relatively  small  intervening  “loops”  can  be  ignored  provided  that  there 
is  no  evidence  that  these  loops  have  established  an  independent  evolutionary  identity,  i.e., 
they  cannot  be  recognized  in  other  related  protein  sequences. 

Partitioning  is  achieved  by  independently  treating  homology  domains  containing 
overlapping  regions.  It  has  been  observed  that  protein  domains  often  coalesce,  forming  a 
composite  domain  that  evolves  as  a  unit  after  the  concatenation  of  the  original  evolutionarily 
independent  domains.  Moreover,  examples  of  the  original  domain  may  remain  in  isolated 
form  and  may  continue  to  evolve  independently.  We  denote  domains  that  do  not  contain 
other  domains  as  “simple”  and  denote  composite  domains  as  “complex.”  Simple  and 
complex  domains  are  classified  along  their  entire  extent.  Composite  domains  are  classified 
independently  of  the  simple  (and/or  complex)  domains  from  which  they  are  composed.  For 
example,  the  complex  domain  A/B  is  recognized  as  a  homology  class  that  is  independent  of 
the  two  simple  domain  classes  A  and  B  (see  Figure  1).  Superfamilies  are  constructed  for  all 
three  of  these  classes.  The  overlap  is  not  ignored;  however,  it  is  not  considered  within  the 
superfamily  classification  scheme.  A  separate  classification  will  be  developed  to  characterize 
the  relationships  among  overlapping  domains. 

This  approach  allows  organization  of  protein  sequences  by  homeomorphic  family 
and  superfamily  while  simultaneously  characterizing  the  data  explicitly  by  domain.  The 
Superfamily  record  in  a  database  entry  contains  a  list  of  the  names  of  the  homology  domain 
superfamilies  into  which  the  domains  identified  in  the  sequence  have  been  classified.  These 
names  are  descriptive  and  contain  the  word  “homology.”  The  name  of  the  homeomorphic 
superfamily  into  which  the  conceptual  complete  sequence  has  been  classified  appears  first 
on  the  list  and  does  not  contain  the  word  “homology.”  Previously  described  methods  to 
classify  proteins  without  specifically  addressing  their  domain  architecture  and  effectively 
handling  fragmentary  data  do  not  partition  the  data  rigorously  (Harris  et  al.,  1992;  States  et 
al.,  1993;  Henikoff  &  Henikoff,  1991,  1993). 
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ORGANIZATION  OF  SEQUENCE  DATA 

Using  this  model,  we  are  converting  the  placement  groups  in  the  database  into 
homeomorphic  protein  superfamilies  and  families  and  are  developing  protocols  to  assign 
sequences  into  families.  In  addition,  we  are  attempting  to  identify  and  annotate  all  homology 
domains.  Previously,  we  classified  only  the  best  characterized  and  most  fully  annotated 
entries  in  the  database.  The  project  to  classify  all  of  the  sequences,  regardless  of  their  state 
of  annotation,  has  proceeded  in  stages.  For  practical  reasons,  we  have  changed  our  approach 
from  one  of  classifying  sequences  into  superfamilies,  which  were  then  further  subdivided, 
to  one  of  classifying  sequences  into  families,  which  can  then  be  grouped  into  superfamilies. 
Dr.  Friedhelm  Pfeiffer  and  other  collaborators  at  the  PIR-Intemational  database  center  at  the 
Martinsried  Institute  of  Protein  Sequences  (MIPS)  embarked  upon  the  family  classification 
project.  The  procedure  followed  is  iterative  and  began  from  the  family  structure  in  Release 
36  of  the  database.  Having  first  identified  the  unclassified  sequences  that  belong  in  already 
defined  families,  the  remaining  unclassified  sequences  were  grouped  beginning  with  the 
longest  sequences. 

So  that  the  family  classification  procedure  can  be  highly  automated,  we  adopt  a 
working  definition  of  a  protein  family  as  a  set  of  conceptually  complete  sequences  that  can 
be  aligned  end-to-end  without  major  discrepancy  by  standard  multiple  sequence  alignment 
methods.  In  practice,  such  sequences  will  have  the  same  domain  architecture  and  an  overall 
sequence  identity  of  at  least  approximately  50%.  This  is  virtually  the  same  range  for  which 
the  three-dimensional  structure  for  a  protein  can  be  confidently  predicted  from  that  of  a 
homolog  whose  structure  has  been  determined  (Sander  &  Schneider,  1991). 

The  classification  project  is  facilitated  by  the  FASTA  Database,  which  contains  all 
scores  above  a  certain  threshold  from  FASTA  searches  (Pearson  &  Lipman,  1988)  of  each 
sequence  against  all  others.  The  database  is  updated  each  time  the  sequence  database  is 
updated  (currently  weekly  or  biweekly)  and  an  interactive  retrieval  system  allows  annotators 
to  query  it  (see  Figure  2).  The  FASTA  Database  is  used  for  the  selection  of  candidates  for 
classification  into  a  family.  The  sequences  are  then  aligned  automatically  and  the  aligned 
pairs  are  screened  for  congruence  of  length  and  threshold  level  of  similarity.  Those  that  meet 
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Figure  2.  Retrieval  of  scores  from  the  FASTA  database.  On  the  basis  of  this  interactive  display,  the  sequences 
from  dog  and  rabbit  (respectively)  were  merged  to  produce  single  entries.  The  resulting  seven  sequences  were 
aligned  and  assigned  placement  numbers.  The  original  papers  were  then  reviewed  and  annotation  made 
consistent  throughout  this  family. 


478 


W.  C.  Barker  et  al. 


rather  stringent  requirements  are  routinely  classified;  others  are  examined  and  classified  by 
scientific  staff.  As  of  Release  41,  over  70%  of  the  sequences  in  the  database  had  been 
analyzed.  Over  90%  of  those  were  classified  as  belonging  to  a  family  or  as  being  the  sole 
representative  of  a  new  family.  About  5%  of  all  entries  are  considered  not  classifiable  by 
this  method,  generally  because  the  sequences  are  too  short  or  fragmentary. 

The  detection  of  homology  domains  within  sequences  and  their  classification  can  be 
approached  in  a  similar  way.  The  group  at  MIPS  periodically  partitions  all  of  the  sequences 
into  recognized  homology  domains  (annotated  as  features  in  the  database)  and  unclassified 
subsequences.  A  database  is  created  that  contains  PASTA  scores  (above  a  threshold)  of  all 
of  these  subsequences  against  each  other.  Pairwise  local  alignments  are  constructed  and  used 
to  select  the  boundaries  of  provisional  domains.  These  assignments  are  later  checked  and 
refined  on  the  basis  of  multiple  sequence  alignments.  Once  the  known  domains  are  substan¬ 
tially  annotated  in  the  database,  PASTA  scores  of  unclassified  segments  searched  against  all 
other  unclassified  segments  will  be  used  to  reveal  additional  domain  homologies  between 
sequences  classified  into  different  families. 


INFERENCE  BASED  ON  HOMOLOGY 

Assigning  sites  of  biological  interest  by  homology  requires  the  construction  of  a 
multiple  sequence  alignment.  Mathematically  rigorous  algorithms  for  multiple  sequence 
alignment  cannot  guarantee  biologically  realistic  alignments,  particularly  for  more  distantly 
related  sequences.  Nevertheless,  for  sequences  and  subsequences  that  are  longer  than  about 
50  residues  and  that  are  at  least  35-50%  identical,  the  major  features  of  an  alignment  are 
reproduced  by  many  algorithms.  Within  this  realm,  alignments  derived  by  comparison  of 
three-dimensional  structures  also  agree  well  with  those  derived  by  sequence  comparison 
methods  (Sander  &  Schneider  1991). 

Sequence  homology  among  domains  does  not  guarantee  close  structural  homology 
or  preservation  of  function.  Por  example,  some  calmodulin  repeat  homology  domains  may 
not  adopt  the  E-P  hand  conformation  or  bind  calcium.  Nevertheless,  a  combination  of 
homology  and  other  biological  or  chemical  knowledge  frequently  allows  properties  of 
domains  to  be  predicted.  This  in  turn  may  allow  prediction  of  functional  characteristics  of  a 
multidomain  protein  even  when  it  is  the  first  sequenced  example  of  its  type.  The  homology 
domain  superfamilies  that  have  been  identified  and  annotated  in  the  PIR-Intemational 
Protein  Sequence  Database,  Release  42  (September  1 994)  are  listed  in  Table  1 . 

Table  1.  Homology  domain  superfamilies  (September  1994) 

(S)-2-hydroxy-acid  oxidase  homology 
3-dehydroquinate  dehydratase  homology 
3-dehydroquinate  synthase  homology 
3-hydroxyacyl-CoA  dehydrogenase  homology 
3 -hydroxy isobutyrate  dehydrogenase  homology 
3-oxoadipate  CoA-transferase  alpha  chain  homology 
3-oxoadipate  CoA-transferase  beta  chain  homology 
3-phosphoshikimate  1  -carboxy  vinyl  transferase 
homology 

6-phosphofructokinase  1  homology 
acetate — CoA  ligase  homology 
adenylate  cyclase  homology 
ADP,ATP  carrier  protein  repeat  homology 
agrin  inhibitor-like  repeat  homology 


alanine  dehydrogenase  homology 

alpha-actinin  actin-binding  domain  homology 

alpha-amylase  core  homology 

animal  Kunitz-type  proteinase  inhibitor  homology 

ankyrin  repeat  homology 

annexin  repeat  homology 

antileukoproteinase  repeat  homology 

apple  homology 

aspartate  kinase  homology 

aspartate/ornithine  carbamoyltransferase  homology 
astacin  homology 
Bacillus  dihydroorotase  homology 
Bacillus  phosphoribosylamine — glycine  ligase 
homology 
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Table  1.  Continued 


barley  yellow  dwarf  virus  RNA-directed  RNA 
polymerase  homology 

Berne  virus  hemagglutinin  homolog  homology 
beta-lactamase  OXA2  homology 
Bowman-Birk  inhibitor  repeat  homology 
BUDS  protein  homology 
C-type  lectin  homology 
Clr/Cls  repeat  homology 
cadherin  repeat  homology 
calmodulin  repeat  homology 
calpain  catalytic  domain  homology 
carbamoyl-phosphate  synthase  (ammonia) 
homology 

carbamoyl-phosphate  synthase 
(glutamine-hydrolyzing)  large  chain  homology 
cellular  retinaldehyde-binding  protein  homology 
Chalara  lysozyme  homology 
cholinesterase  homology 
cold  shock  domain  homology 
complement  factor  H  repeat  homology 
cpl  repeat  homology 
crk  transforming  protein  homology 
cystatin  homology 
cytochrome  bS  core  homology 
cytochrome  b6  homology 
cytochrome  c  homology 
cytochrome  c3  homology 
cytochrome-bS  reductase  homology 
cytokine  receptor  homology 
desulforedoxin  homology 
discoidin  I  N-terminal  homology 
dnaJ  N-terminal  domain  homology 
EOF  homology 
elongation  factor  Tu  homology 
endozepine  homology 

equine  herpesvirus  1  glycoprotein  homology 
erb  A  transforming  protein  homology 
ets  DNA-binding  domain  homology 
eubacterial  ribosomal  protein  L27  homology 
eubacterial  ribosomal  protein  SIS  homology 
ferredoxin  homology 
ferroxidase  repeat  homology 
fibrinogen  beta/gamma  homology 
fibronectin  type  I  repeat  homology 
fibronectin  type  II  repeat  homology 
fibronectin  type  III  repeat  homology 
flavodoxin  homology 
gelsolin  repeat  homology 
glutamate  receptor  homology 
gramicidin  S  synthetase  I  repeat  homology 
guanylate  cyclase  catalytic  domain  homology 
guanylate  kinase  homology 
H+-transporting  ATP  synthase  alpha  chain 
homology 

Helicobacter  urease  alpha  chain  homology 
Helicobacter  urease  beta  chain  homology 
hemolysin  A  homology 


hemopexin  repeat  homology 
herpesvirus  tegument  protein  homology 
herpesvirus  thymidine  kinase  homology 
hevein  chitin-binding  domain  homology 
hexokinase  homology 
hisi  bifunctional  enzyme  homology 
hisi  protein  homology 
histidine — tRNA  ligase  homology 
histidinol  dehydrogenase  homology 
HMG  box  homology 
homeobox  homology 
homoserine  dehydrogenase  homology 
human  herpesvirus  1  UL3S  protein  homology 
imidazoleglycerol-phosphate  dehydratase  homology 
immunoglobulin  homology 
influenza  C  virus  nonstructural  protein  NS1/NS2 
homology 

Kazal  proteinase  inhibitor  homology 
kringle  homology 
lactaldehyde  reductase  homology 
large  structural  phosphoprotein  homology 
LDL  receptor  ligand-binding  repeat  homology 
LDL  receptor  YWTD-containing  repeat  homology 
LDL  receptor/EGF  precursor  homology 
leucine-rich  alpha-2-glycoprotein  repeat  homology 
leukocyte  common  antigen  cytosolic  domain 
homology 

LIM  metal-binding  repeat  homology 
lipocalin  homology 
LTEl  protein  homology 
malK  protein  homology 
MAP2/tau  repeat  homology 
methylated-DNA — protein-cysteine 
S-methyltransferase  homology 
methylphosphotriester-DNA  methyltransferase 
homology 

motor  domain  homology 
myb  DNA-binding  repeat  homology 
myc  transforming  protein  homology 
myosin  head  homology 
NAD(P)+  transhydrogenase  (B-specific)  alpha 
chain  homology 

NAD(P)+  transhydrogenase  (B-specific)  beta  chain 
homology 

NGF  receptor  repeat  homology 
nifA  central  domain  homology 
orotate  phosphoribosyltransferase  homology 
orotidine- 5 ’-phosphate  decarboxylase  homology 
osteonectin  homology 
parathyroid  hormone  homology 
peptidylglycine  monooxygenase  I  homology 
phage  T4  DNA  topoisomerase  (ATP-hydrolyzing) 
medium  chain  homology 
phage  T4  lysozyme  homology 
phosphoprotein  phosphatase  homology 
phosphoribosylaminoimidazole  carboxylase  carbon 
dioxide-fixation  chain  homology 


480 


W.  C.  Barker  et  al. 


Table  1.  Continued 


phosphoribosylaminoimidazole  carboxylase 
catalytic  chain  homology 
phosphoribosylformyiglycinamidine  cyclo-ligase 
homology 

phosphoribosylglycinamide  formyltransferase 
homology 

phosphotransferase  system  glucose-specific  enzyme 
II,  factor  II  homology 

phosphotransferase  system  glucose-specific  enzyme 
II,  factor  III  homology 
phosphotransferase  system  mannitol- specific 
enzyme  II/III  homology 

plastoquinol — ^plastocyanin  reductase  1 7K  protein 
homology 

pleckstrin  repeat  homology 

potato  leaf  roll  virus  coat  protein  homology 

POU  domain  homology 

protein  4.1  membrane-binding  domain  homology 
protein  kinase  C  C2  region  homology 
protein  kinase  C  zinc-binding  repeat  homology 
protein  kinase  homology 

protein  kinase  regulatory  chain  nucleotide-binding 
repeat  homology 

protein-glutamate  0-methyltransferase  homology 
protein-tyrosine-phosphatase  homology 
rel  homology 

response  regulator  homology 
ribonucleoprotein  repeat  homology 
rRNA  N-glycosidase  homology 
rubredoxin  homology 
S-locus-specific  glycoprotein  homology 
serum  albumin  repeat  homology 
SH2  homology 
SH3  homology 

shikimate  dehydrogenase  homology 


shikimate  kinase  homology 

sigma  factor  katF  homology 

sigma  factor  region  1  homology 

spectrin/dystrophin  repeat  homology 

statherin/histatin  signal  sequence  homology 

subtilisin  homology 

sucrase/isomaltase  homology 

sucrose/sucrose-phosphate  synthase  homology 

sulfite  oxidase  homology 

thioredoxin  homology 

thymidylate  synthase  homology 

trefoil  homology 

trpC  homology 

trpD  homology 

trpD-trpG  homology 

trpF  homology 

trpG  homology 

trypsin  homology 

tryptophan  synthase  alpha  chain  homology 
tryptophan  synthase  beta  chain  homology 
type  I  dihydrofolate  reductase  homology 
ubiquinol — cytochrome-c  reductase  1  IK  protein 
homology 
ubiquitin  homology 
V/P  protein  homology 

vaccinia  virus  13.6K  Hindlll-C  protein  homology 
vaccinia  virus  27.4K  Hindlll-C  protein  homology 
vaccinia  virus  8.6K  Hindlll-C  protein  homology 
vaccinia  virus  8.8K  Hindlll-C  protein  homology 
VHl-type  protein-tyrosine-phosphatase  homology 
villin  headpiece  homology 
virB4. 1  protein  homology 
virB4.2  protein  homology 
1 87  superfamilies  found 
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MUTATIONAL  ANALYSIS  OF  THE  BPTI 
FOLDING  PATHWAY 
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INTRODUCTION 

Over  the  past  three  decades,  considerable  effort  has  been  focused  on  elucidating  the 
mechanisms  by  which  polypeptide  chains  fold  into  well-defined  three-dimensional  struc¬ 
tures  (Kim  &  Baldwin,  1990;  Creighton,  1992a;  Mattthews,  1993).  Major  goals  of  these 
studies  include  the  identification  and  characterization  of  partially  folded  intermediates  and 
the  analysis  of  transition  states  that  represent  the  energetic  barriers  in  the  folding  mechanism. 
Recently,  there  has  been  great  progress  in  the  structural  analysis  of  folding  intermediates  by 
high  resolution  NMR  spectroscopy  of  intermediate  analogs  and  native  proteins  that  have 
been  isotopically-labeled  during  refolding.  Structural  analysis  alone,  however,  is  not  suffi¬ 
cient  to  determine  why  particular  intermediates  form  or  what  types  of  interactions  stabilize 
their  conformations.  By  their  very  nature,  transition  states  are  even  more  difficult  to 
characterize  directly.  Questions  about  folding  energetics  and  the  roles  of  individual  interac¬ 
tions  in  determining  the  folding  mechanism  can  often  be  addressed  by  studying  the  folding 
of  protein  variants  that  differ  by  relatively  small  perturbations  of  the  covalent  structure. 
Recently-developed  genetic  techniques  have  made  it  possible  to  alter  virtually  any  amino 
acid  residue  in  a  protein,  and  mutational  methods  have  now  been  used  to  study  the  folding 
mechanisms  of  several  proteins  (Fersht  et  al.,  1992;  Goldenberg  1992a;  Jennings  et  al., 
1 992).  We  describe  here  some  of  our  recent  work  using  amino  acid  replacements  to  study 
the  folding  of  a  particularly  well-characterized  protein,  bovine  pancreatic  trypsin  inhibitor 
(BPTI). 

BPTI  is  a  small  protein,  composed  of  58  amino  acid  residues,  that  folds  into  a  single 
compact  domain.  The  native  conformation  is  stabilized  by  three  disulfide  bonds,  and  the 
protein  can  be  unfolded  by  reducing  the  disulfides.  Disulfide-bonded  intermediates  accumu¬ 
late  during  the  oxidative  refolding  of  the  reduced  protein,  and  these  intermediates  can  be 
chemically  trapped,  physically  separated  and  characterized  individually.  The  disulfide-cou¬ 
pled  refolding  pathway  (Figure  1)  for  BPTI  was  first  characterized  by  T.E.  Creighton  in  the 
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Figure  1.  The  BPTI  folding  pathway.  Intermediates  are  identified  by  the  disulfide  bonds  they  contain,  and  the 
approximate  path  of  the  polypeptide  chain  in  the  intermediates  is  shown  schematically.  Intermediates  grouped 
together  in  the  boxes  labeled  I  and  II  interconvert  rapidly  on  the  time  scale  of  the  folding  experiments,  except 
[5-55],  which  rearranges  with  the  other  one-disulfide  intermediates  relatively  slowly.  Reproduced,  with 
permission,  from  Goldenberg,  1992b. 


1970s  and  has  continued  to  be  the  subject  of  extensive  analysis  (Creighton,  1978;  Creighton 
&  Goldenberg,  1984;  Weissman  &  Kim  1991;  Creighton,  1992b;  Goldenberg,  1992b). 
During  the  past  few  years,  new  methods  have  been  developed  to  trap  and  isolate  the 
intermediates  (Weissman  &  Kim,  1991),  analogs  of  the  intermediates  have  been  analyzed 
by  high-resolution  NMR  (Staley  &  Kim,  1992;  van  Mierlo  et  al.,  199 lab,  1993,  1994),  and 
mutational  methods  have  been  used  to  assess  the  energetic  contributions  of  individual  amino 
acid  residues  (Goldenberg  et  al,  1989, 1992;  Coplen  et  al.,  1990;  Zhang  &  Goldenberg,  1993; 
Mendoza  et  al.,  1994). 

While  the  recent  advances  in  characterizing  the  BPTI  folding  intermediates  have  led 
to  a  much  better  understanding  of  this  pathway,  the  new  results  have  also  been  accompanied 
by  considerable  controversy.  Two  of  the  more  controversial  aspects  concern  the  origin  of  the 
intramolecular  rearrangements  in  the  pathway  and  the  nature  of  the  major  transition  states. 
As  described  in  the  following  sections,  we  have  attempted  to  address  these  questions  by 
analyzing  the  folding  mechanisms  and  kinetics  of  genetically  modified  BPTI  variants  (Zhang 
&  Goldenberg,  1993;  Mendoza  et  al.,  1994). 


KINETIC  TRAPS  AND  THE  ORIGIN  OF  INTRAMOLECULAR 
REARRANGEMENTS 

One  of  the  most  striking  aspects  of  the  BPTI  folding  pathway  is  the  role  of 
intramolecular  rearrangements  in  forming  the  three-disulfides  of  the  native  protein 
(Creighton,  1977a).  Of  the  various  two-disulfide  intermediates  that  accumulate  during 
folding,  only  one,  containing  the  30-5 1  and  5-55  disulfides  and  designated  N|h,  readily  forms 
a  third  disulfide.  But,  this  species  does  not  form  readily  from  the  population  of  one-disulfide 
intermediates.  Instead,  the  kinetically  preferred  pathway  involves  the  formation  of  other 
two-disulfide  intermediates,  which  then  undergo  intramolecular  rearrangements  to  yield 
N|[j  (Creighton,  1977a;  Goldenberg,  1988).  Although  N|{]  can,  under  certain  conditions, 
form  directly,  the  rate  constant  for  the  intramolecular  step  in  forming  the  second  disulfide  is 
approximately  1,000-fold  lower  than  for  the  competing  reactions. 
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Because  the  thiol-disulfide  exchange  reactions  involve  a  nucleophilic  displacement 
of  one  sulfur  atom  in  an  existing  disulfide  by  a  thiol  sulfur  atom,  the  rearrangement(s)  that 
produce  N|[j  must  involve  at  least  one  non-native  disulfide.  Two  species  with  one  non-native 
disulfide  each,  [30-51,5-14]  and  [30-51,5-38]  have  been  detected  in  refolding  reactions,  and 
it  seems  likely  that  one  or  both  of  these  species  plays  this  role.  Weissman  and  Kim  (1991), 
using  improved  methods  for  trapping  and  isolating  the  intermediates,  recently  found  that  the 
non-native  intermediates  are  somewhat  less  stable  than  earlier  experiments  had  indicated. 
While  this  result  might,  at  first  glance,  suggest  that  the  non-native  intermediates  are  less 
important  than  originally  thought,  the  evidence  for  the  rearrangement  mechanism  remains 
solid,  and  it  is  generally  agreed  that  at  least  one  non-native  species  must  act  as  an 
intermediate,  though  it  may  accumulate  to  only  low  levels. 

Two  other  intermediates  that  accumulate  during  refolding,  [30-51,14-38]  and  [5- 

55.14- 38],  also  contain  two  of  the  three  native  disulfides  and  have  folded  conformations 
very  similar  to  that  of  the  native  protein,  but  these  species  do  not  readily  form  a  third 
disulfide.  In  one  case,  [5-55,14-38],  the  failure  to  form  the  third  disulfide  is  due  to  the 
inaccessibility  of  the  remaining  two  cysteine  thiols  to  the  disulfide  reagents  (such  as  the 
disulfide  forms  of  glutathione  or  dithiothreitol)  used  as  oxidants  (States  et  al.,  1984; 
Creighton  &  Goldenberg,  1984).  The  other  case,  [30-51,14-38]  is  somewhat  more  complex, 
however,  since  the  thiols  are  at  least  partially  accessible  to  disulfide  reagents  (Creighton, 
1977b).  Here,  it  appears  that  at  least  part  of  the  reason  that  the  third  disulfide  does  not  form 
is  that  there  are  steric  constraints  that  inhibit  formation  of  the  transition  state  for  the 
intramolecular  transition  required  for  direct  disulfide  formation.  At  neutral  or  slightly 
alkaline  pH,  [5-55,14-38]  accumulates  for  very  long  times  and  acts  as  a  kinetic  trap  during 
folding,  limiting  the  fraction  of  molecules  that  form  a  third  disulfide  even  after  several  hours 
(States  et  al.,  1984;  Creighton  &  Goldenberg,  1984).  The  other  species,  [30-51,14-38],  can 
rearrange  with  other  two-di sulfide  intermediates  at  pH  8.7,  but  at  pH  7.3  these  rearrange¬ 
ments  are  very  unfavorable  and  [30-51,14-38]  also  acts  as  a  kinetic  trap  (Weissman  &  Kim, 
1991). 

Since  intramolecular  rearrangements  are  necessary  to  convert  [5-55,14-38]  or  [30- 

51.14- 38]  to  Nsh,  it  might  appear  that  the  rearrangements  in  the  BPTI  pathway  arise  only 
because  of  the  stabilities  of  the  former  two  species  and  their  inability  to  directly  form  a  third 
disulfide  (Weissman  &  Kim,  1992).  We  have  recently  described  a  mutant  form  of  BPTI,  in 
which  Tyr  35  is  replaced  by  Leu,  that  we  believe  helps  clarify  this  issue  (Zhang  & 
Goldenberg,  1993).  This  variant  is  one  of  eight  aromatic  -a  Leu  mutants  we  have  constructed 
to  examine  the  roles  of  the  four  Phe  and  four  Tyr  residues  of  the  wild-type  protein.  Like  most 
of  the  other  aromatic  residues  in  BPTI,  Tyr  35  is  largely  buried  in  the  native  protein  (Figure  2) 
and  its  replacement  with  Leu  severely  destabilizes  the  folded  conformation.  The  folded 
mutant  protein  has  circular  dichroism  spectra  similar  to  those  of  the  wild-type  protein  and 
is  an  active  trypsin  inhibitor,  suggesting  that  the  mutation  has  not  altered  the  overall 
conformation  of  the  protein. 

The  distribution  of  folding  intermediates  for  the  Y35  variant  is  shown  in  Figure  3, 
along  with  that  for  the  wild-type  protein.  In  this  experiment,  the  proteins  were  unfolded  by 
reducing  the  three  disulfides,  refolding  was  initiated  at  pH  7.3  by  adding  0.1  mM  oxidized 
glutathione  (GSSG),  and  the  reactions  were  quenched  at  the  indicated  times  by  acidification. 
The  trapped  intermediates  were  separated  by  reversed  phase  HPLC  as  described  by  Weiss¬ 
man  and  Kim  (1991),  and  the  disulfides  in  the  isolated  intermediates  were  determined  by 
peptide  mapping.  The  most  pronounced  effect  of  the  mutation  is  the  elimination  of  two 
intermediates,  [30-51,14-38]  and  [5-55,14-38].  These  are  the  two  species  that  act  as  kinetic 
traps  during  refolding  of  the  wild-type  protein,  and  their  absence  greatly  increases  the  rate 
at  which  the  native  protein  appears. 
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Figure  2.  Schematic  representation  of  the  native  structure  of  BPTI,  showing  the  Tyr  35  side-chain  and  the  sites 
of  the  amino  acid  replacements  used  to  analyze  the  major  transition  states  for  unfolding  and  folding.  Adapted 
from  Figure  1  of  Mendoza  et  al,  1994,  and  drawn  with  the  program  Molscript  by  Per  Kraulis. 


HPLC  Retention  Time 

Figure  3.  HPLC  analysis  of  intermediates  trapped  at  various  times  during  the  refolding  of  wild-type  and  Y35L 
BPTI.  Refolding  reactions  were  carried  out  at  pH  7.3  in  the  presence  of  0. 1  mM  GSSG.  At  the  indicated  times, 
the  reactions  were  quenched  by  the  addition  of  formic  acid  to  a  final  concentration  of  5%.  The  trapped 
intermediates  were  applied  to  a  Vydac  Cig  column  and  eluted  with  a  gradient  of  acetonitrile  in  0.1% 
trifluoroacetic  acid  as  absorbance  of  the  eluent  at  229  nm  was  monitored.  The  identities  of  the  intermediates 
were  determined  by  peptide  mapping.  Reproduced,  with  permission,  from  Zhang  &  Goldenberg,  1993. 
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Figure  4.  Kinetics  of  folding  and  un¬ 
folding  of  unmodified  Y35L  BPTI 
(filled  circles)  and  modified  forms  in 
which  the  thiols  of  Cys  14  and  38  were 
blocked  with  iodoacetamide  (open 
squares)  or  iodoacetate  (open  circles). 
Refolding  was  carried  out  in  the  pres¬ 
ence  of  80  mM  oxidized  dithiothreitol 
at  pH  8.7,  25°  C,  while  unfolding  reac¬ 
tions  contained  2  mM  dithiothreitol  but 
were  carried  out  under  conditions  that 
were  otherwise  the  same  as  for  the  re¬ 
folding  experiments.  The  refolding  re¬ 
actions  were  monitored  by  gel 
electrophoresis  of  samples  trapped  by 
reaction  with  iodoacetate  at  the  indi¬ 
cated  times.  For  the  modified  protein, 
the  indicated  percent  folded  represents 
the  concentration  of  [30-51,5-55], 
while  for  the  unmodified  protein  the 
percent  folded  represents  the  sum  of 
[30-5 1,5-55]  (NIB)  andN.  Reproduced, 
with  permission,  from  Zhang  &  Gold- 
enberg,  1993. 


The  major  intermediates  that  accumulate  during  refolding  of  the  Y35G  protein  are 
[30-51],  [30-51,14-38]  (N|}j)  and  a  smaller  amount  of  [30-51,5-14],  Since  the  two  kineti- 
c ally-trapped  intermediates  are  eliminated,  it  might  be  expected  that  the  mutant  protein 
would  fold  by  a  more  direct  pathway,  such  as 

R-^  [30-51]^  NIB -A  N, 

where  R  and  N  represent  the  fully  reduced  and  native  proteins.  In  order  to  test  this  possibility, 
we  prepared  modified  forms  of  the  mutant  protein  in  which  the  14-38  disulfide  was 
selectively  reduced  to  produce  N|h  and  the  resulting  thiols  alkylated  with  either  io¬ 
doacetamide  or  iodoacetate.  If  the  mutant  protein  folds  primarily  via  the  direct  mechanism 
shown  above,  then  the  proteins  lacking  the  Cys  14  and  38  thiols  should  be  able  to  form 
N|B  as  readily  as  the  unmodified  mutant  protein.  If,  on  the  other  hand,  intramolecular 
rearrangements  play  a  role  in  the  formation  of  N|h,  blocking  these  thiols  should  greatly 
reduce  the  rate  of  forming  this  species,  as  has  been  demonstrated  previously  for  the  wild-type 
protein  (Creighton,  1977a;  Goldenberg,  1988), 

The  folding  and  unfolding  kinetics  of  the  modified  (open  symbols)  and  unmodified 
(filled  symbols)  forms  of  the  mutant  protein  are  shown  in  Figure  4.  The  unmodified  protein 
both  folds  and  unfolds  more  rapidly  than  the  forms  in  which  the  Cys  14  and  38  thiols  have 
been  alkylated.  The  rate  constant  for  the  intramolecular  step  in  the  formation  of  a  second 
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disulfide  bond  is  approximately  200-fold  greater  for  the  unmodified  protein  than  it  is  in  the 
absence  of  the  two  thiols.  These  results  indicate  that  during  the  refolding  of  the  unmodified 
Y3  BPTI  the  preferred  mechanism  involves  formation  of  two-disulfide  intermediates  other 
than  N|h,  followed  by  intramolecular  rearrangements  to  produce  this  species.  As  in  the 
folding  of  the  wild-type  protein,  these  rearrangements  must  involve  at  least  one  species 
containing  a  non-native  disulfide,  such  as  [30-51,5-14]. 

These  results  argue  that  the  predominance  of  the  rearrangement  mechanism  cannot 
be  accounted  for  by  the  stabilities  of  the  kinetically-trapped  intermediates,  since  these 
species  do  not  accumulate  during  the  refolding  of  the  mutant  protein.  Further,  the  low  rate 
of  forming  Ngf}  is  not  due  to  inaccessibility  of  the  Cys  5  and  55  thiols  in  the  [30-51] 
intermediate,  since  the  thiols  have  been  shown  to  react  readily  with  oxidized  glutathione 
(Creighton,  1977a;  Goldenberg,  1988).  Rather,  the  rearrangement  mechanism  is  kinetically 
preferred  because  of  the  very  low  rate  of  the  intramolecular  step  in  forming  NIh  versus  the 
other  two-disulfide  intermediates.  What,  then,  is  structural  basis  for  this  kinetic  preference? 
The  answer  to  this  question  must  lie  in  the  energetic  and  structural  differences  between  the 
one-disulfide  intermediates  and  the  transition  state  for  directly  forming  N|{].  The  structures 
of  the  intermediates  have  been  characterized  through  NMR  spectroscopy  of  analogs  (van 
Mierlo  et  al.,  1991a;  1993;  Staley  &  Kim  1992),  but  that  of  the  transition  state  can  only  be 
inferred  from  kinetic  experiments,  as  described  in  the  following  section. 


MUTATIONAL  ANALYSIS  OF  THE  MAJOR  TRANSITION  STATES 

During  the  refolding  of  reduced  BPTI,  the  slowest  intramolecular  processes  are 
associated  with  the  formation  of  N|{|,  by  either  direct  disulfide  formation  from  the  one-di- 
sulfide  intermediates,  as  discussed  above,  or  by  rearrangement  of  other  two  disulfide 
intermediates  (II  in  Figure  1).  The  reverse  of  these  reactions  are  also  very  slow;  during 
reductive  unfolding  of  the  native  protein,  the  14-38  disulfide  is  rapidly  reduced  to  produce 
N|[},  but  further  reduction  and  unfolding  takes  several  hours.  Because  the  refolding  and 
unfolding  experiments  are  carried  out  under  identical  conditions,  except  for  the  concentra¬ 
tions  of  thiol  and  disulfide  reagents  present,  the  transition  states  for  forming  N|}j  during 
folding  are  expected  to  be  equivalent  to  those  for  breaking  down  this  species  during 
unfolding.  Thus,  the  transition  states  for  forming  N|[]  can  be  characterized  by  analyzing  the 
kinetics  of  unfolding,  which  are  experimentally  more  accessible  than  the  folding  kinetics. 

In  order  to  characterize  these  transition  states,  we  have  examined  the  effects  of  a 
series  of  destabilizing  amino  acid  replacements  on  the  kinetics  of  direct  reduction  of  N|fj 
and  rearrangement  of  this  species  (Mendoza  et  al.,  1 994).  The  rationale  of  these  experiments 
is  illustrated  in  Figure  5.  As  shown  in  the  figure,  N|[]  can  either  undergo  intramolecular 
rearrangements  to  generate  other  two-disulfide  intermediates,  such  as  [30-51,5-14],  or  can 
be  directly  reduced  by  reaction  with  a  thiol  reagent,  shown  as  R-SH  in  the  figure.  In  either 
reaction,  formation  of  the  transition  state  requires  attack  of  one  of  the  buried  disulfides  in 
N|t}  by  a  thiol,  and  at  least  part  of  the  molecule  must  undergo  an  unfolding  transition.  It  is 
possible,  however,  that  different  regions  of  N|{j  would  have  to  unfold  during  the  two 
processes.  If  this  is  the  case,  then  substitutions  that  destabilize  some  regions  of  the  protein 
would  be  expected  to  enhance  selectively  the  rate  of  direct  reduction,  while  other  replace¬ 
ments  might  preferentially  enhance  the  rate  of  rearrangement.  Thus,  it  should  be  possible  to 
characterize  the  two  transition  states  by  looking  for  amino  acid  replacements  that  selectively 
enhance  one  rate  or  the  other. 

Toward  this  end,  we  measured  the  rates  of  direct  reduction  and  intramolecular 
rearrangement  for  1 8  BPTI  variants  that  were  known  to  increase  the  overall  rate  of  reductive 
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Figure  5.  Rationale  of  a  mutational  experiment  to  characterize  the  transition  states  for  direct  reduction  of 
N|h  and  intramolecular  rearrangement  of  this  species.  As  shown  by  the  schematic  representations,  both 
processes  are  expected  to  require  at  least  partial  unfolding  of  NIh,  but  the  two  processes  may  involve  disruption 
of  different  parts  of  the  folded  conformation.  If  the  two  corresponding  transition  states  contain  residual 
stabilizing  interactions,  amino  acid  replacements  at  different  sites  might  be  expected  to  selectively  enhance 
the  rate  of  one  reaction  or  the  other. 


unfolding.  As  illustrated  in  Figure  2,  these  replacements  are  located  at  13  sites  throughout 
the  protein. 

To  determine  the  rate  constants  for  direct  reduction  and  intramolecular  rearrangement 
of  N|t3,  the  kinetics  of  reductive  unfolding  were  measured  in  the  presence  of  varying 
concentrations  of  DTTIh.  As  for  the  wild-type  protein,  the  native  mutant  proteins  were 
quickly  converted  to  a  native-like  two-disulfide  intermediate,  which  was  then  reduced 
further  at  a  rate  that  depended  upon  the  concentration  of  DTT|h.  The  rate  constant  for  the 
intramolecular  rearrangement,  k^eam  was  estimated  from  the  value  of  the  apparent  rate 
constant  for  disappearance  of  N|[}  extrapolated  to  zero  DTT|h,  and  the  rate  constant  for  direct 
reduction,  was  determined  from  the  slope  of  the  apparent  rate  versus  DTT|h  concentra¬ 
tion.  All  of  the  amino  acid  replacements  examined  were  found  to  increase  both  rate  constants, 
in  some  cases  by  as  much  as  1 00,000-fold. 

In  Figure  6,  the  rate  constants  for  the  two  process  are  plotted  against  one  another  on 
logarithmic  scales.  As  shown,  there  is  a  remarkably  good  correlation  between  the  logarithms 
of  the  two  rate  constants;  the  line  shown  in  the  figure  has  a  slope  of  1.3  and  a  correlation 
coefficient  of  0.97  for  data  covering  five  orders  of  magnitude.  Since  the  logarithms  of  the 
rate  constants  are  proportional  to  the  free  energy  differences  between  the  common  ground 
state  (N|[j)  and  the  two  transition  states,  the  correlation  illustrated  in  Figure  6  implies  an 
energetic  similarity  between  the  transition  states  for  rearrangement  and  direct  reduction. 
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Figure  6.  Correlation  between  the 
rate  constants  for  direct  reduction 
of  Nsh  and  intramolecular  rear¬ 
rangement  of  this  species  for  dif¬ 
ferent  BPTI  variants.  Reproduced, 
with  permission,  from  Mendoza  et 
al.,  1994. 


We  believe  that  the  simplest  interpretation  of  these  results  is  that  all  of  the  amino  acid 
replacements  examined  destabilize  interactions  in  N|{j  that  must  be  disrupted  or  weakened 
in  both  of  the  transition  states.  Because  these  substitutions  alter  a  variety  of  different  types 
of  residues  located  throughout  much  of  the  folded  protein,  it  seems  most  likely  that  the 
structure  of  N|h  is  extensively  disrupted  in  both  transition  states,  with  few  if  any  interactions 
other  than  the  disulfide  bonds  stabilizing  specific  conformations.  This  conclusion  is  consis¬ 
tent  with  a  variety  of  other  evidence  about  the  transition  states  (Weissman  &  Kim,  1 99 1 , 1 992; 
Mendoza  et  al.,  1994). 

These  results  can  help  explain  why  direct  formation  of  N|h  is  disfavored  during 
folding  and  why  intramolecular  rearrangements  play  a  prominent  role  in  the  BPTI  folding 
pathway.  Since  the  transition  states  for  direct  reduction  or  rearrangement  of  N|[J  are  expected 
to  be  equivalent  to  those  for  the  reverse  reactions  required  to  form  N|h,  any  stable  interactions 
present  in  intermediates  preceding  this  species  during  folding  are  probably  disrupted  in  the 
transition  states.  NMR  studies  of  analogs  of  the  major  one-disulfide  intermediates,  [30-51] 
and  [5-55],  indicate  that  both  of  these  species  contain  significant  native-like  structure. 
Analogs  of  [5-55]  have  folded  conformations  that  are  nearly  indistinguishable  from,  though 
less  stable  than,  that  of  the  native  protein  (van  Mierlo  et  al.,  1991a;  Staley  &  Kim,  1992). 
An  analog  of  [30-5 1  ]  has  been  shown  to  contain  about  two  thirds  of  the  structure  of  the  native 
protein,  including  the  central  p-sheet  and  a-helix,  but  some  segments  of  the  polypeptide 
chain,  including  the  N-terminal  15  residues,  are  disordered  (van  Mierlo  et  al.,  1993).  Our 
mutational  analysis  of  the  transition  states  suggests  that  the  structure  present  in  the  one-di- 
sulfide  intermediates  probably  has  to  be  disrupted  during  direct  formation  of  Nlf],  thus 
contributing  to  the  high  energy  barrier  observed  for  this  reaction.  Unfortunately,  it  is  not 
currently  known  how  stable  the  structure  in  the  one-disulfide  intermediates  is,  and  it  is 
difficult  to  conclude  whether  this  effect  is  sufficient  to  account  entirely  for  the  observed  rate 
reduction,  but  it  is  likely  to  be  a  significant  factor. 

The  other  native-like  two-disulfide  intermediates,  [5-55,14-38]  and  [30-51,14-38], 
are  believed  to  be  formed  by  direct  formation  of  the  14-38  disulfide  in  the  corresponding 
one-disulfide  intermediates.  Because  the  14-38  disulfide  is  located  on  the  surface  of  the 
native  protein  and  can  be  both  reduced  and  formed  readily  in  the  native  conformation, 
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native-like  structure  in  the  one-disulfide  intermediates  is  expected  to  enhance,  rather  than 
impede,  the  rate  of  forming  these  species. 

Analogs  of  the  non-native  two-disulfide  intermediates,  [30-51,5-14]  and  [30-51,5- 
38],  have  also  been  studied  by  high  resolution  NMR  (van  Mierlo  et  ah,  1994).  These  species 
have  conformations  very  similar  to  those  of  the  [30-51]  intermediate,  in  spite  of  the 
non-native  disulfide  between  Cys  5  and  either  Cys  14  or  38.  Because  the  N-terminal  15 
residues  of  [30-5 1]  are  disordered,  it  appears  that  the  non-native  disulfides  can  form  without 
significant  disruption  of  structure  already  present  in  [30-51],  consistent  with  kinetic  results 
indicating  that  these  species  are  formed  directly  from  the  population  of  one-disulfide 
intermediates  (Creighton,  1977a). 

Thus,  it  appears  that  all  of  the  significant  two-disulfide  intermediates,  except  N|{], 
can  be  formed  from  the  one-disulfide  intermediates  without  disrupting  structure  that  is 
already  present.  These  two-disulfide  species,  therefore,  form  in  preference  to  N|h.  Once 
formed,  however,  these  intermediates  cannot  directly  form  a  third  disulfide  to  yield  the  native 
protein;  two  of  the  intermediates  contain  non-native  disulfides,  and  the  other  two,  [5-55,14- 
38]  and  [30-51,14-38]  contain  so  much  native-like  structure  that  direct  formation  of  the 
remaining  buried  disulfide  is  very  slow.  Thus,  each  of  these  species  must  undergo  one  or 
more  intramolecular  rearrangements  to  yield  N|[]  before  a  third  disulfide  can  be  incorporated. 
Our  mutational  studies,  and  other  results,  indicate  that  these  rearrangements  require  exten¬ 
sive  unfolding  of  the  structure  present  in  the  intermediates.  As  a  consequence,  both  the  direct 
disulfide  rearrangement  pathway  and  the  rearrangement  pathway  require  disruption  of 
structure  already  present  in  the  protein.  The  rearrangement  pathway  is  preferred  kinetically 
because  at  the  point  where  the  pathways  diverge,  i.e.  formation  of  a  second  disulfide,  direct 
disulfide  formation  is  particularly  slow. 


SUMMARY 

The  experiments  described  here  illustrate  how  mutational  analysis  can  be  used  to 
probe  the  energetics  of  a  protein  folding  pathway.  Studies  with  a  mutant  for  which  native-like 
intermediates  are  destabilized  have  demonstrated  that  the  rearrangements  observed  in  the 
BPTI  pathway  are  not  due  simply  to  the  stabilities  of  these  kinetically-trapped  species. 
Measurements  of  the  effects  of  mutations  on  unfolding  kinetics  indicate  that  the  major 
transition  states  for  folding  and  unfolding  are  extensively  unfolded.  These  results,  together 
with  structural  studies  of  the  intermediates,  suggest  that  the  rearrangement  mechanism  seen 
in  the  BPTI  folding  pathway  is  a  consequence  of  steric  constraints  in  the  major  one-disulfide 
intermediates. 

Results  such  as  those  described  here  also  illustrate  the  high  degree  of  cooperativity 
among  the  individual  interactions  that  stabilize  protein  conformations  and  determine  folding 
mechanisms;  a  single  amino  acid  replacement  can  dramatically  change  the  distribution  of 
folding  intermediates  (Figure  3)  or  increase  the  rate  of  reducing  a  protein  disulfide  by  as 
much  as  100,000-fold  (Figure  6).  Understanding  the  physical  basis  of  this  cooperativity  is 
now  one  of  the  major  challenges  in  the  study  of  protein  structure,  folding  and  function. 
Further  analysis  of  genetically  modified  proteins,  using  thermodynamic,  kinetic,  structural 
and  computational  methods,  is  likely  to  be  an  important  means  of  addressing  these  problems. 
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INTRODUCTION 

The  principal  source  of  geometric  information  used  to  solve  three  dimensional 
structures  of  macromolecules  by  NMR  resides  in  short  (<  5 A)  approximate  interproton 
distance  restraints  derived  from  nuclear  Overhauser  enhancement  (NOE)  measurements 
(1-5).  In  order  to  extract  this  information  it  is  essential  to  first  completely  assign  the 
spectrum  of  the  macromolecule  in  question  and  then  to  assign  as  many  structurally  useful 
NOE  interactions  as  possible.  The  larger  the  number  of  NOE  restraints,  the  higher  the 
precision  and  accuracy  of  the  resulting  structures  (5-7).  Indeed,  with  current  state-of-the-art 
methodology  it  is  now  possible  to  obtain  NMR  structures  of  proteins  at  a  precision  and 
accuracy  comparable  to  2  A  resolution  crystal  structures  (7-9) 

For  proteins  of  100  residues  or  less,  conventional  homonuclear  2D  NMR  methods 
can  be  applied  with  a  considerable  degree  of  success  ( 1  -4, 1 0, 1 1 ).  As  the  number  of  residues 
and  molecular  weight  increases  beyond  100  and  12  kDa,  respectively,  two  main  obstacles 
present  themselves  which  made  it  necessary  to  extend  the  2D  NMR  techniques  to  higher 
demensions  and  develope  new  approaches.  First,  the  increased  spectral  complexity  arising 
from  the  presence  of  a  larger  number  of  protons  results  in  extensive  chemical  shift  overlap 
and  degeneracy,  rendering  the  2D  spectra  uninterpretable.  Second,  the  rotational  correlation 
time  increases  with  molecular  weight  resulting  in  large  line  widths  and  a  concomitant 
severe  decrease  in  the  sensitivity  of  correlation  experiments  based  on  intrinsically  small 
three-bond  couplings.  These  obstacles  can  be  overcome  by  increasing  the  dimension¬ 
ality  of  the  spectra  to  resolve  problems  associated  with  spectral  overlap  and  by  simultane¬ 
ously  making  use  of  heteronuclear  couplings  that  are  larger  than  the  linewidths  to  circumvent 
limitations  in  sensitivity  (5,  6).  This  approach  necessitates  uniform  ‘^N  and/or  labeling 
of  the  macromolecule  under  consideration. 

The  concept  of  increasing  spectral  dimensionality  to  extract  information  can  perhaps 
most  easily  be  understood  by  analogy  (5).  Consider  for  example  the  encyclopedia  Britannica. 
In  a  one-dimensional  representation,  all  the  information  (i.e.  words  and  sentences  arranged 
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in  a  particular  set  order)  present  in  the  encyclopedia  would  be  condensed  into  a  single  line. 
If  this  line  were  expanded  to  two-dimensions  in  the  form  of  a  page,  the  odd  word  may  be 
resolved  but  the  vast  majority  would  still  be  superimposed  on  each  other.  When  this  page  is 
expanded  into  a  book  (i.e.  three-dimensions)  comprising  a  set  number  of  lines  and  words 
per  page,  as  well  as  a  fixed  number  of  pages,  some  pages  may  become  intelligible,  but  many 
words  will  still  lie  on  top  of  each  other.  The  final  expansion  to  the  multi-volume  book  (i.e. 
four  dimensions)  then  makes  it  possible  to  extract  in  full  all  the  information  present  in  the 
individual  entries  of  the  encyclopedia. 


APPLICATION  OF  MULTIDIMENSIONAL  HETERONUCLEAR 
NUCLEAR  MAGNETIC  RESONANCE  SPECTROSCOPY  TO 
PROTEIN  STRUCTURE  DETERMINATION 

The  design  and  implementation  of  higher  dimensionality  NMR  experiments  can  be 
carried  out  by  the  appropriate  combination  of  2D  NMR  experiments,  as  illustrated  schemati¬ 
cally  in  Fig.  1 . 

A  3D  experiment  is  constructed  from  two  2D  pulse  schemes  by  leaving  out  the 
detection  period  of  the  first  experiment  and  the  preparation  pulse  of  the  second  (12).  This 
results  in  a  pulse  train  comprising  two  independently  incremented  evolution  periods  t]  and 
t2,  two  corresponding  mixing  periods  Mj  and  M2,  and  a  detection  period  Similarly,  a  4D 
experiment  is  obtained  by  combining  three  2D  experiments  in  an  analogous  fashion.  Thus, 
conceptually  /2-dimensional  NMR  can  be  conceived  as  a  straightforward  extension  of  2D 
NMR.  The  real  challenge,  however,  of  3D  and  4D  NMR  is  two-fold:  first,  to  ascertain  which 
2D  experiments  should  be  combined  to  best  advantage;  and  second,  to  design  the  pulse 
sequences  in  such  a  way  that  undesired  artifacts,  which  may  severely  interefere  with  the 
interpretation  of  the  spectra,  are  removed.  This  task  is  far  from  trivial. 

Heteronuclear  3D  and  4D  NMR  experiments  exploit  a  series  of  large  one-bond 
heteronuclear  couplings  for  magnetization  transfer  through-bonds  which  are  summarized  in 
Fig.  2. 

This,  together  with  the  fact  that  the  'H  nucleus  is  always  detected,  renders  these 
experiments  very  sensitive.  Indeed,  high  quality  3D  and  4D  heteronuclear-edited  spectra  can 
easily  be  obtained  on  samples  of  1-2  mM  uniformly  labeled  protein  in  a  time  frame  that  is 


2D 


3D 


4D 


Pa  ■  Ea  (tl)  -  Ma  -  Eb  (ti)  ■  Mb  -  Ec  (13)  -  Me  -  Dc  (14) 


Figure  1.  General  representation  of  pulse  sequences  used  in  multi-dimensional  NMR  illustrating  the  relation¬ 
ship  between  the  basic  schemes  used  to  record  2D,  3D  and  4D  NMR  spectra.  Note  how  3D  and  4D  experiments 
are  constructed  by  the  appropriate  linear  combination  of  2D  ones.  Abbreviations:  P,  preparation;  E,  evolution; 
M,  mixing;  and  D,  detection.  In  3D  and  4D  NMR,  the  evolution  periods  are  incremented  independently. 


Three-  and  Four-Dimensional  Heteronuclear  NMR 


495 


13  ' 

H“C— H 


30-40HZ 


1 3^  J5//Z  IIHz^'^qSSHz  13 


— c 

II 

o 


^O-lOOMz  Y40HZ 

H  H 


>15 


Figure  2.  Summary  of  the  one-bond  heteronuclear  couplings  along  the  polypeptide  chain  utilized  in  3D  and 
4D  NMR  experiments. 


limited  solely  by  the  number  of  increments  that  have  to  be  collected  for  appropriate 
digitization  and  the  number  of  phase  cycling  steps  that  have  to  be  used  to  reduce  artifacts  to 
an  acceptably  low  level.  Typical  measurement  times  are  1  to  3  days  for  3D  experiments  and 
2.5  to  5  days  for  4D  ones.  A  detailed  technical  review  of  heteronuclear  multi-dimensional 
NMR  has  been  provided  by  Clore  &  Gronenbom(13)  and  Bax  &  Grzesiek  (14). 

Many  of  the  3D  and  4D  experiments  are  based  on  heteronuclear-editing  of  'H-'H 
experiments  so  that  the  general  appearance  of  conventional  2D  experiments  is  preserved  and 
the  total  number  of  cross-peaks  present  is  the  same  as  that  in  the  2D  equivalents  (5,  6,  13, 
14).  The  progression  from  a  2D  spectrum  to  3D  and  4D  heteronuclear-edited  spectra  is 
depicted  schematically  in  Fig.  3. 

Consider,  for  example  the  cross-peaks  involving  a  particular  ‘H  frequency  in  a  2D 
NOESY  spectrum,  a  3D  or  ‘^C-edited  NOESY  spectrum,  and  finally  a  4D  '^N/*^C  or 
‘^C/*^C-edited  NOESY  spectrum.  In  the  2D  spectrum  a  series  of  cross  peaks  will  be  seen 
from  the  originating  proton  frequencies  in  the  F|  dimension  to  the  single  destination  'H 
frequency  along  the  F2  dimension.  From  the  2D  experiment  it  is  impossible  to  ascertain 
whether  these  NOEs  involve  only  a  single  destination  proton  or  several  destination  protons 
with  identical  chemical  shifts.  By  spreading  the  spectrum  into  a  third  dimension  according 
to  the  chemical  shift  of  the  heteronucleus  attached  to  the  destination  proton(s),  NOEs 
involving  different  destination  protons  will  appear  in  distinct  ^H-*H  planes  of  the  3D 
spectrum.  Thus  each  interaction  is  simultaneously  labeled  by  three  chemical  shift  coordinates 
along  three  orthogonal  axes  of  the  spectrum.  The  projection  of  all  these  planes  onto  a  single 
plane  yields  the  corresponding  2D  spectrum.  For  the  purposes  of  sequential  assignment, 
heteronuclear-edited  3D  spectra  are  often  sufficient  for  analysis.  However,  when  the  goal  of 
the  analysis  is  to  assign  NOEs  between  protons  far  apart  in  the  sequence,  a  3D  ^^N-  or 
*^C-edited  NOESY  spectrum  will  often  prove  inadequate.  This  is  because  the  originating 
protons  are  only  specified  by  their  chemical  shifts,  and  more  often  than  not,  there  are 
several  protons  which  resonate  at  the  same  frequencies.  For  example,  in  the  case  of  the  153 
residue  protein  interleukin- 1(3,  there  are  about  60  protons  which  resonate  in  a  0.4  ppm 
interval  between  0.8  and  1.2  ppm.  Such  ambiguities  can  then  be  resolved  by  spreading  out 
the  3D  spectrum  still  further  into  a  fourth  dimension  according  to  the  chemical  shift  of  the 
heteronucleus  attached  to  the  originating  protons,  so  that  each  NOE  interaction  is  simulta¬ 
neously  labeled  by  four  chemical  shift  coordinates  along  four  orthogonal  axes,  namely  those 
of  the  originating  and  destinations  protons  and  those  of  the  corresponding  heteronuclei 
directly  bonded  to  these  protons  (15-17).  The  result  is  a  4D  spectrum  in  which  each  plane 
of  the  3D  spectrum  constitutes  a  cube  in  the  4D  spectrum. 

For  illustration  purposes  it  is  also  useful  to  compare  the  type  of  information  that  can 
be  extracted  from  a  very  simple  system  using  2D,  3D  and  4D  NMR.  Consider  a  molecule 
with  only  two  NH  and  two  aliphatic  protons  in  which  only  one  NH  proton  is  close  to  an 
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2D  4D 

Figure  3.  Schematic  illustration  of  the  progression  and  relationship  between  2D,  3D  and  4D  heteronuclear 
NMR  experiments.  The  closed  circles  represent  NOE  cross  peaks.  In  the  example  shown  there  are  1 1  NOEs 
originating  from  11  different  protons  in  the  F]  dimension  to  a  single  frequency  position  in  the  F2  dimension. 
In  the  2D  spectrum,  it  is  impossible  to  ascertain  whether  there  is  only  one  destination  proton  or  several  in  the 
F2  dimension.  By  spreading  the  spectrum  into  a  third  dimension  (labeled  F2),  according  to  the  chemical  shift 
of  the  heteronucleus  attached  to  the  destination  proton,  it  can  be  seen  that  the  NOEs  now  lie  in  three  distinct 
’H(Fi)-'H(F3)  planes,  indicating  that  three  different  destination  protons  are  involved.  However,  the  'H 
chemical  shifts  still  provide  the  only  means  of  identifying  the  originating  protons.  Hence  the  problem  of 
spectral  overlap  still  prevents  the  unambiguous  assignment  of  these  NOEs.  By  extending  the  dimensionality 
of  the  spectrum  to  four,  each  NOE  interaction  is  labeled  by  four  chemical  shifts  along  four  orthogonal  axes. 
Thus,  the  NOEs  in  each  plane  of  the  3D  spectrum  are  now  spread  over  a  cube  in  the  4D  spectrum  according 
to  the  chemical  shift  of  the  heteronucleus  directly  attached  to  the  originating  protons.  Adapted  from  ref  (15). 
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aliphatic  proton.  In  addition,  the  chemical  shifts  of  the  NH  protons  are  degenerate,  as  are 
those  of  the  aliphatic  protons,  so  that  only  two  resonances  are  seen  in  the  one-dimensional 
spectrum.  In  the  2D  NOESY  spectrum,  an  NOE  will  be  observed  between  the  resonance 
position  of  the  NH  protons  and  the  resonance  position  of  the  aliphatic  protons,  but  it  will  be 
impossible  to  ascertain  which  one  of  the  four  possible  NH-aliphatic  proton  combinations 
gives  rise  to  the  NOE.  By  spreading  the  spectrum  into  a  third  dimension,  for  example  by  the 
chemical  shift  of  the  *^N  atoms  attached  to  the  NH  protons,  the  number  of  possibilities  will 
be  reduced  to  two,  provided,  of  course,  that  the  chemical  shifts  of  the  two  nitrogen  atoms 
are  different.  Finally,  when  the  fourth  dimension  corresponding  to  the  chemical  shift  of  the 
atoms  attached  to  the  aliphatic  protons  is  introduced,  a  unique  assignment  of  the 
NH-aliphatic  proton  pair  giving  rise  to  the  NOE  can  be  made. 

Fig.  4A  presents  a  portion  of  the  2D  ^^N-edited  NOESY  spectrum  of  interleukin- 1  p 
(153  residues)  illustrating  NOE  interactions  between  the  NH  protons  along  the  F2  axis  and 
the  C“H  protons  along  the  F 1  dimension.  Despite  the  fact  that  a  large  number  of  cross-peaks 
can  be  resolved,  it  can  be  seen  that  many  of  the  cross  peaks  have  identical  chemical  shifts 
in  one  or  other  dimensions.  For  example,  there  are  15  cross  peaks  involving  NH  protons  at 
a  F2(^H)  chemical  shift  of  ~9.2  ppm.  A  single  ^H(Fi)-‘H(F3)  plane  of  the  3D  ^^N-edited 
NOESY  spectrum  of  interleukin- Ip  at  5'^N(F2)  =  123.7  ppm  is  shown  in  Fig.  4B.  Not  only 
is  the  number  of  cross-peaks  in  this  slice  small,  but  at  5'H(F3)  ~  9.2  ppm  there  is  only  a 
single  cross  peak  involving  one  NH  proton.  The  correlations  observed  in  the  ^^N-edited 
NOESY  spectrum  are  through-space  ones.  Intraresidue  correlations  from  the  NH  protons  to 
the  C^H  and  C^H  protons  can  similarly  be  resolved  using  a  3D  ’^N-edited  HOHAHA 
spectrum  in  which  efficient  isotropic  mixing  sequences  are  used  to  transfer  magnetization 
between  protons  via  three-bond  ’H-*H  couplings. 

The  3D  *^N-edited  NOESY  and  HOHAHA  spectra  constitute  only  one  of  several 
versions  of  a  3D  heteronuclear-edited  spectrum.  Many  alternative  through-bond  pathways 
can  be  utilized  to  great  effect.  Consider  for  example,  the  delineation  of  amino  acid  spin 
systems  which  involves  grouping  those  resonances  which  belong  to  the  same  residue.  In  2D 
NMR,  correlation  experiments  are  used  to  delineate  either  direct  or  relayed  connectivities 
via  small  three-bond  'H-'H  couplings.  Even  for  proteins  of  50-60  residues,  it  can  be  difficult 
to  delineate  long  chain  amino  acids  such  as  Lys  and  Arg  in  this  manner.  In  heteronuclear  3D 
NMR  an  alternative  pathway  can  be  employed  which  involves  transferring  magnetization 
first  from  a  proton  to  its  directly  attached  carbon  atom  via  the  large  '  Jch  coupling  ("-1 30  Hz), 
followed  by  either  direct  or  relayed  transfer  of  magnetization  along  the  carbon  chain  via  the 
Dec  couplings  (-30-40  Hz),  before  transferring  the  magnetization  back  to  protons  (18-20). 
An  example  of  such  a  spectrum  is  the  so  called  HCCH-TOCSY  shown  in  Fig.  4C.  The 
^H(Fi)-'H(F3)  plane  at  6'^C(F2)  =  59  ppm  illustrates  both  direct  and  relayed  connectivities 
along  various  side  chains  originating  from  C“H  protons.  As  expected,  the  resolution  of  the 
spectrum  is  excellent  and  there  is  no  spectral  overlap.  Just  as  importantly,  however,  the 
sensitivity  of  the  experiment  is  extremely  high  and  complete  spin  systems  are  readily 
identified  in  interleukin- ip  even  for  long  side  chains,  such  as  those  of  two  lysine  residues 
shown  in  the  figure.Tndeed,  analysing  spectra  of  this  kind,  it  was  possible  to  obtain  complete 
*H  and  assignments  for  the  side  chains  of  interleukin- 1  p  (21). 

3D  NMR  also  permits  one  to  devise  experiments  for  sequential  assignment  which 
are  based  solely  on  through-bond  connectivities  via  heteronuclear  couplings  (13, 14, 22, 23) 
and  thus  do  not  rely  on  the  NOESY  experiment.  This  becomes  increasingly  important  for 
larger  proteins,  as  the  types  of  connectivites  observed  in  these  correlation  experiments  are 
entirely  predictable,  whereas  in  the  NOESY  spectrum  which  relies  solely  on  close  proximity 
of  protons,  it  may  be  possible  to  confuse  sequential  connectivities  with  long  range  ones. 
These  3D  heteronuclear  correlation  experiments  are  of  the  triple  resonance  variety  and  make 
use  of  one-bond  ^^CO(i-l)-‘5N(i)^  '^QO-'^Qi)  and  ‘3c«(i)-'3C0(i)  couplings. 
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Figure  4.  Example  of  2D  and  3D  spectra  of  interleukin- 1 P  recorded  at  600  MHz.  The  2D  spectrum  in  panel 
A  shows  the  NH(F2  axis)-C“H(Fi  axis)  region  of  a  2D  '^N-edited  NOESY  spectrum.  The  same  region  of  a 
single  NH(F3)-*H(F,)  plane  of  the  3D  ‘^N-edited  NOESY  at  5‘^N(F2)  =  123.7  ppm  is  shown  in  panel  B.  The 
actual  3D  spectrum  comprises  64  such  planes  and  projection  of  these  on  a  single  plane  would  yield  the  same 
spectrum  as  in  (A).  Panel  C  shows  a  single  'H(F3)-'H(Fi)  plane  of  the  3D  HCCH-TOCSY  spectrum  at  6’^C(F2) 
=  38.3  ±  nSW  (where  SW  is  the  spectral  width  of  20.71  ppm  in  the  dimension)  illustrating  both  direct  and 
relayed  connectivities  originating  from  the  C“H  protons.  Note  how  easy  it  is  to  delineate  complete  spin  systems 
of  long  side  chains  such  as  Lys  (i.e.  cross  peaks  to  the  C^H,  C^H,  C^H  and  C^H  protons  are  observed)  owing 
to  the  fact  that  magnetization  along  the  side  chain  is  transferred  via  large  'Jcc  couplings.  Several  features  of 
the  HCCH-TOCSY  spectrum  should  be  pointed  out.  First,  extensive  folding  is  employed  which  does  not 
obscure  analysis  as  *^C  chemical  shifts  for  different  carbon  types  are  located  in  characteristic  regions  of  the 
'^C  spectrum  with  little  overlap.  Second,  the  spectrum  is  edited  according  to  the  chemical  shift  of  the 
heteronucleus  attached  to  the  originating  proton  rather  than  the  distination  one.  Third,  multiple  cross  checks 
on  the  assignments  are  readily  made  by  looking  for  the  symmetry  related  peaks  in  the  planes  corresponding  to 
the  *^C  chemical  shifts  of  the  destination  protons  in  the  original  slice.  Adapted  from  ref  (5). 


as  well  as  two-bond  '^C“(i-l)-‘^N(i)  couplings.  In  this  manner  multiple  independent  path¬ 
ways  for  linking  the  resonances  of  one  residue  with  those  of  its  adjacent  neighbour  are 
available,  thereby  avoiding  ambiguities  in  the  sequential  assignment. 

In  practice,  only  a  limited  number  of  3D  triple  and  double  resonance  experiments 
need  to  be  performed  to  obtain  complete  assignments.  In  our  experience,  the  following  eight 


Three-  and  Four-Dimensional  Heteronuclear  NMR 


499 


3D  experiments  not  only  provide  all  the  information  required,  but  are  also  characterized  by 
high  sensitivity  and  can  be  recorded  in  as  little  as  two  weeks  of  measuring  time.  Specifically, 
the  3D  CBCA(CO)NH  and  HBHA(CBCACO)NH  experiments  (24,  25)  are  used  to  correlate 
the  chemical  shifts  of  C“(i)/CP(i-l)  and  H“(i)/HP(i-l),  respectively,  of  residue  i-1  with  the 
^^N(i)/NH(i)  chemical  shifts  of  residue  i;  and  the  complementary  the  3D  C(CO)NH  and 
H(CCO)NH  experiments  (26)  are  used  to  correlate  the  chemical  shifts  of  the  aliphatic  side 
chain  and  'H  resonances,  respectively,  of  residue  i- 1  with  the  ^^N(i)/NH(i)  chemical  shifts 
of  residue  i.  In  the  first  two  experiments,  magnetization  originating  on  CP  is  transferred  to 
by  a  COSY  mixing  pulse,  while  in  the  second  pair  of  experiments,  magnetization  is 
transferred  from  a  side  chain  C  along  the  carbon  chain  to  C®  via  isotropic  mixing.  Intraresidue 
correlations  to  the  NH  group  can  be  obtained  from  the  3D  CBCANH  (27)  and  edited 
HOHAHA  (28)  experiments.  The  3D  CBCANH  experiment  correlates  the  chemical  shifts 
of  C‘^(i)/CP(i)  (as  well  as  those  of  C“(i-l)/CP(i-l)  which  invariably  give  rise  to  weaker  cross 
peaks),  with  the  ^^N(i)/NH(i)  chemical  shifts  of  residue  i;  the  3D  ^^N-separated  HOHAHA 
experiment  correlates  the  chemical  shifts  of  the  side  chain  protons  of  residue  i  with  the 
'5N(i)/NH(i)  chemical  shifts  of  residue  i.  Finally,  the  3D  HCCH-COSY  and  HCCH-TOCSY 
experiments  (2 1 ,  22)can  be  used  to  confirm  and  obtain  complete  *H  and  *^C  assignments  of 
the  side  chains. 

The  power  of  4D  heteronuclear  NMR  spectroscopy  for  unraveling  interactions  that 
would  not  have  been  possible  in  lower  dimensional  spectra  is  illustrated  in  Fig.  5  by  the 
*^C/‘^C-edited  NOESY  spectrum  of  interleukin- 1  p  (16). 

Fig.  5  A  shows  a  small  portion  of  the  aliphatic  region  between  1  and  2  ppm  of  a 
conventional  2D  NOESY  spectrum  of  interleukin- Ip.  The  overlap  is  so  great  that  no  single 
individual  cross  peak  can  be  resolved.  One  might  therefore  wonder  just  how  many  NOE 
interactions  are  actually  superimposed,  for  example,  at  the  ‘H  chemical  shift  coordinates  of 
the  letter  X  at  1.39  (Fi)  and  1.67  (F2)  ppm  .  A  ^H(F2)“^H(F4)  plane  of  the  4D  spectrum  at 
5*^C(F]),  5’^C(F3)  =  44.3,  34.6  ppm  is  shown  in  panel  B  and  the  square  box  at  the  top  right 
hand  side  of  this  panel  encloses  the  region  between  1  and  2  ppm.  Only  two  cross  peaks  are 
present  in  this  region,  and  the  arrow  points  to  a  single  NOE  between  the  C^H  and  C^H  protons 
of  Lys-77  with  the  same  *H  chemical  shift  coordinates  as  the  letter  X  in  panel  A.  All  the  other 
NOE  interactions  at  the  same  *H  chemical  shift  coordinates  can  be  determined  by  inspection 
of  a  single  ‘^C(F|)-*^C(F3)  plane  taken  at  5'H(F2),  6'H(F4)  =  1.39,  1.67  ppm.  This  reveals  a 
total  of  7  NOE  interactions  superimposed  at  the  ^H  chemical  shift  coordinates  of  the  letter 
X.  Another  feature  of  the  4D  spectrum  is  illustrated  by  the  two  ^H(F2)-‘H(F4)  planes  at 
different  Fi  and  F3  frequencies  shown  in  panels  C  and  B.  In  both  cases,  there  are 
cross-peaks  involving  protons  with  identical  or  near  chemical  shifts,  namely  that  between 
Pro-9 1(C“H)  and  Tyr-90(C“H),  diagnostic  of  a  cis-proline,  in  panel  C,  and  between  Phe- 
99(CP^H)  and  Met-95(C^H)  in  panel  D.  These  interactions  could  not  be  resolved  in  either  a 
2D  spectrum  or  a  3D  ‘^C-edited  spectrum  as  they  would  lie  on  the  spectral  diagonal  (i.e.the 
region  of  the  spectrum  corresponding  to  magnetization  that  has  not  been  transferred  from 
one  proton  to  another).  In  the  4D  spectrum,  however,  they  are  easy  to  observe,  provided,  of 
course,  that  the  chemical  shifts  of  the  directly  bonded  nuclei  are  different. 

Because  the  number  of  NOE  interactions  present  in  each  'H(F4)"'H(F2)  plane  of  4D 
i3c/I5n  or  ‘^C/*^C-edited  NOESY  spectra  is  so  small,  the  inherent  resolution  in  a  4D 
spectrum  is  extremely  high,  despite  the  low  level  of  digitization.  Indeed,  spectra  with 
equivalent  resolution  can  be  recorded  at  magnetic  field  strengths  considerably  lower  than 
600  MHz,  although  this  would  obviously  lead  to  a  reduction  in  sensitivity.  Further,  it  can  be 
calculated  that  4D  spectra  with  virtual  lack  of  resonance  overlap  and  good  sensitivity  can  be 
obtained  on  proteins  with  as  many  as  400  residues.  Thus,  once  complete  ^H,  ^^N  and 
assignments  are  obtained,  analysis  of  4D  spectra  should  permit  the  automated  assignment 
of  almost  all  NOE  interactions. 
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CONCLUSION 

In  this  chapter  we  have  summarized  the  recent  developments  in  heteronuclear  3D 
and  4D  NMR  which  have  been  designed  to  extend  the  NMR  methodology  to  medium  sized 
proteins  in  the  15-30  kDa  range.  The  underlying  principle  of  this  approach  consists  of 
extending  the  dimensionality  of  the  spectra  to  obtain  dramatic  improvements  in  spectral 
resolution  while  simultaneously  exploiting  large  heteronuclear  couplings  to  circumvent 
problems  associated  with  larger  linewidths.  A  key  feature  of  all  these  experiments  is  that 
they  do  not  result  in  any  increase  in  the  number  of  observed  cross  peaks  relative  to  their  2D 
counterparts.  Hence,  the  improvement  in  resolution  is  achieved  without  raising  the  spectral 
complexity,  rendering  data  interpretation  straightforward.  Thus,  for  example,  in  4D  hetero- 
nucl ear-edited  NOESY  spectra,  the  NOE  interactions  between  proton  pairs  are  not  only 
labeled  by  the  ’H  chemical  shifts  but  also  by  the  corresponding  chemical  shifts  of  their 
directly  bonded  heteronuclei  in  four  orthogonal  axes  of  the  spectrum.  Also  important  in  terms 
of  practical  applications  is  the  high  sensitivity  of  these  experiments  which  makes  it  feasible 
to  obtain  high  quality  spectra  in  a  relatively  short  time  frame  on  1  -2  mM  protein  samples 
uniformly  labeled  with  '^N  and/or  ’^C. 


Figure  5.  Comparison  of  2D  and  4D  NMR  spectra  of  interleukin- 1  (3  recorded  at  600  MHz  (16).  The  region 
between  1  and  2  ppm  of  the  2D  NOESY  spectrum  is  shown  in  (A).  'H(F2)-'H(F4)  planes  at  several  ^^C(Fi)  and 
’^C(F3)  frequencies  of  the  4D  NOESY  spectrum  are  shown  in  panels  B  to  D.  No  individual  cross  peaks 

can  be  observed  in  the  2D  spectrum  and  the  letter  X  has  coordinates  of  1.39  and  1 .67  ppm.  In  contrast,  only 
two  cross  peaks  are  observed  in  the  boxed  region  in  panel  B  betwen  1  and  2  ppm,  one  of  which  (indicated  by 
an  arrow)  has  the  same  ’H  coordinates  as  the  letter  X.  Further  analysis  of  the  complete  4D  spectrum  reveals 
the  presence  of  7  NOE  cross  peaks  superimposed  at  the  coordinates  of  the  letter  X.  This  can  be  ascertained 
by  looking  at  •^C(F])-’^C(F3)  plane  taken  at  the  'H  coordinates  of  X.  True  diagonal  peaks  corresponding  to 
magnetization  that  has  not  been  transferred  from  one  proton  to  another,  as  well  as  intense  NOE  peaks  involving 
protons  attached  to  the  same  carbon  atom  (i.e.  methylene  protons),  appear  in  only  a  single  ’H(F2)-'H(F4)  plane 
of  each  '^C(Fi),  *H(F2),  ^H(F4)  cube  at  the  carbon  frequency  where  the  originating  and  destination  carbon 
atoms  coincide  (i.e.  at  F]  =  F3).  Thus,  these  intense  resonances  no  longer  obscure  NOEs  between  proton  with 
similar  or  degenerate  chemical  shifts.  Two  examples  of  such  NOEs  can  be  seen  in  panels  C  (between  the 
C“H  protons  of  Pro-91  and  Tyr-90)  and  D  (between  one  of  the  C^H  protons  of  Phe-77  and  the  methyl  protons 
of  Met-95).  These  various  planes  of  the  4D  spectrum  also  illustrate  another  key  aspect  of  3D  and  4D  NMR, 
namely  the  importance  of  designing  the  pulse  scheme  to  optimally  remove  undesired  artifacts  which  may 
severely  interfere  with  the  interpretation  of  the  spectra.  Thus,  while  the  4D  '^C/'^C-edited  NOESY  experiment 
is  conceptually  analogous  to  that  of  a  4D  ’^C/'^N-edited  one,  the  design  of  a  suitable  pulse  scheme  is  actually 
much  more  complex  in  the  case.  This  is  due  to  the  fact  that  there  are  a  large  number  of  spurious 

magnetization  transfer  pathways  that  can  lead  to  observable  signals  in  the  homonuclear  case.  For 

example,  in  the  4D  '  ^N/’  ^C-edited  case  there  are  no  “diagonal  peaks”  which  would  correspond  to  magnetization 
that  has  not  been  transferred  from  one  hydrogen  to  another,  as  the  double  heteronuclear  filtering  (i.e.  and 
*^N)  is  extremely  efficient  at  completely  removing  these  normally  very  intense  and  uninformative  resonances. 
Such  a  double  filter  is  not  available  in  the  case  so  that  both  additional  pulses  and  phase  cycling  are 

required  to  suppress  magnetization  transfer  through  these  pathways.  This  task  is  far  from  trivial  as  the  number 
of  phase  cycling  steps  in  4D  experiments  is  severely  limited  by  the  need  to  keep  the  measurement  time  down 
to  practical  levels  (i.e  less  than  1  week).  The  most  efficient  way  of  obtaining  artefact  free  spectra  is  through 
the  incorporation  of  pulse  field  gradients  to  suppress  undesired  coherence  transfer  pathways  (29).  Indeed, 
inclusion  of  6  pulse  field  gradients  into  the  original  pulse  scheme  of  Clore  et  al.  (16)  reduces  the  phase  cycle 
from  eight  to  two  steps  (30).  The  results  of  such  care  in  pulse  design  can  be  clearly  appreciated  from  the  artifact 
free  planes  shown  in  panels  B-D.  However,  when  a  4D  ^^C/*^C-edited  NOESY  spectrum  is  recorded  with  the 
same  pulse  scheme  as  that  used  in  the  4D  ’^N/'^C  experiment  (with  the  obvious  replacement  of  ^^N  pulses  by 
pulses),  a  large  number  of  spurious  peaks  are  observed  along  a  pseudo-diagonal  at  6’H(F2)  =  5*H(F4)  in 
planes  where  the  carbon  frequencies  of  the  originating  and  destination  protons  do  not  coincide.  As  a  result,  it 
becomes  virtually  impossible  under  these  circumstances  to  distinguish  artifacts  from  NOEs  between  protons 
with  the  same  *H  chemical  shifts,  as  was  possible  with  complete  confidence  in  panels  C  and  D. 


Three-  and  Four-Dimensional  Heteronuclear  NMR 


501 


Just  as  2D  NMR  opened  the  application  of  NMR  to  the  structure  determination 
of  small  proteins  of  less  than  about  100  residues,  3D  and  4D  heteronuclear  NMR  provide 
the  means  of  extending  the  methodology  to  medium  sized  proteins  in  the  150  to  300 
residue  range.  Indeed,  the  determination  in  1991  of  the  first  high  resolution  structure  of 
a  protein  in  the  15-20  kDa  range,  namely  the  cytokine  interleukin- ip  (153  residues  and 
18  kDa),  using  3D  and  4D  heteronuclear  NMR  (31)  demonstrated  beyond  doubt  that  the 
technology  is  now  available  for  obtaining  the  structures  of  such  medium  sized  proteins 
at  a  level  of  accuracy  and  precision  that  is  comparable  to  the  best  results  attainable  for 
small  proteins.  Subsequently,  a  number  of  other  medium  sized  protein  structures  have 
been  determined  using  these  method,  including  interleukin-4  (32-34),  glucose  permease 
II A  (35),  a  complex  of  calmodulin  with  a  target  peptide  (36),  a  complex  of  cyclophilin 
and  cyclosporin  A  (37),  a  specific  complex  of  the  transcription  factor  GATA-1  (38)  with 
its  DNA  target  site,  human  macrophage  inflammatory  protein  ip  (39),  and  the  oligomeri¬ 
zation  domain  of  p53  (40). 
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INTRODUCTION 

The  design  of  new  computational  procedures  to  predict  molecular  complexes  is  a  fast 
developing  area  stimulated  by  the  growing  demands  of  researchers  working  in  various  fields 
of  molecular  biology  and  looking  for  more  powerful  tools  for  their  investigations.  The 
problem  for  molecular  recognition  (docking)  approaches  may  be  shortly  formulated  as 
following:  how  to  match  two  molecules  with  known  3D  structures  in  order  to  predict  the 
configuration  of  their  complex?  In  the  general  case,  no  additional  prior  knowledge  on 
binding  sites  is  assumed  to  be  available. 

The  algorithms  for  molecular  recognition  in  a  “ligand-receptor”  system  (for  a  review, 
see  Refs.  1  -4)  include,  and  sometimes  combine,  approaches  which  concentrate  mostly  on 
energetic  considerations  (5-10),  and  procedures  based  on  a  search  for  steric  fit  (11-19) 
including  those  which  make  use  of  physico-chemical  surface  complementarity  (20,21).  The 
“rigid  body”  approach  is  justified  in  most  cases  of  known  macromolecular  3D  structures 
(22,23).  However,  procedures  which  explicitly  take  into  account  the  ligand  (e.g.  oligopep¬ 
tide)  flexibility  (24)  have  also  started  to  appear  (9,25). 

The  problem  of  an  inherent  inaccuracy  in  3D  structures  of  the  molecules  is  one  of 
the  most  serious  obstacles  which  docking  procedures  have  to  overcome.  This  inaccuracy  has 
both  a  “natural”  origin  (internal  flexibility)  and  a  “technical”  reason  (poor  quality  of  the 
X-ray  data),  which  often  is  the  consequence  of  the  same  flexibility.  This  problem  has  been 
treated  by  introducing  a  certain  tolerance  to  the  surface  of  molecules  (14,17),  reducing 
atom-atom  interactions  to  residue-residue  ones  (22),  or  truncating  certain  amino-acid 
sidechains  (16).  However,  even  such  “radical”  methods  as  elimination  of  non-hydrophobic 
atoms  (21),  apparently  will  not  help,  when  conformational  changes  upon  complex  formation 
are  really  substantial,  or  the  X-ray  data  on  one  or  both  (macro)molecules  is  not  available  and 
the  structure,  based  on  alternative  sources  (NMR,  modeling),  is  not  well  defined. 

In  order  to  address  the  problem  of  poorly  determined  structures  in  molecular 
recognition,  we  designed  a  direct  computer  experiment  with  molecules  totally  deprived  of 
any  structural  features  smaller  than  7A.  For  this  purpose,  we  modified  our  previously 
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developed  docking  procedure  (17)  which  predicts  complex  configurations  on  the  basis  of 
surface  complementarity.  The  modified  procedure  was  applied  on  various  known  protein 
complexes  taken  from  the  Brookhaven  Protein  Data  Bank.  In  most  cases,  except  antigen  - 
antibody  complexes,  a  pronounced  trend  towards  the  correct  complex  configuration  was 
clearly  indicated  and  the  real  binding  sites  were  predicted.  The  distinction  between  the 
prediction  of  the  antigen  -  antibody  complexes  and  the  other  molecular  pairs  may  reflect 
important  differences  in  the  principles  of  complex  formation  (26,27). 


METHODS 

Basic  Docking  Algorithm 

The  primary  molecular  recognition  algorithm  is  described  in  detail  elsewhere  (17). 
Briefly,  the  3D  atomic  structures  of  “ligand”  and  “receptor”  molecules  are  projected  on  a 
3D  grid.  The  surface  of  the  receptor  molecule  is  represented  by  a  layer  of  small  positive 
numbers  and  the  inner  part  is  represented  by  large  negative  numbers.  The  ligand  molecule 
is  represented  by  positive  numbers  only.  Everywhere  on  the  grid  outside  the  two  molecules 
there  are  zero  values.  Thus,  when  the  projected  images  of  the  molecules  are  translated  one 
relative  to  the  other,  and  point  by  point  multiplication  all  over  the  grid  is  taken,  the  absence 
of  contact  between  projection  points  of  the  molecules  contributes  zero,  the  contact  between 
the  ligand  and  the  surface  of  the  receptor  contributes  a  small  positive  number,  and  the  contact 
between  the  ligand  and  the  inner  part  of  the  receptor  contributes  a  large  negative  number 
(penalty  for  penetration).  All  possible  orientations  of  the  ligand  are  sampled  with  a  given 
angle  interval.  The  resulting  highest  score  list  of  the  ligand  positions  relative  to  the  receptor, 
gives  the  configurations  of  the  molecular  complex  with  largest  areas  of  contact  between  the 
ligand  and  the  receptor.  The  algorithm  was  used  both  as  described  above  and  in  its 
“hydrophobic”  modification  (21)  to  predict  the  complex  configurations  for  a  number  of 
molecules  with  known  high  resolution  X-ray  structures  and  proved  to  be  a  reliable  tool  for 
docking  studies. 

Modified  Algorithm 

One  of  the  most  important  parameters  for  the  described  molecular  images  is  the 
interval  of  the  grid,  which  sets  up  the  resolution,  or  the  accuracy,  of  molecules  representation. 
In  a  regular  docking  procedure,  the  grid  step  is  in  the  range  of  0.7-1.7A  (17,21).  Such  a 
grid-step  represents  an  ultimate  threshold  for  details  of  the  molecular  structure.  No  detail 
smaller  than  the  grid-step  is  reflected  in  the  molecule  3D  grid  projection,  and  consequently, 
no  such  detail  is  taken  into  account  in  the  search  for  intermolecular  fits.  Thus,  a  natural 
quality  of  grid  representation,  that  is,  the  possibility  to  vary  the  grid  step,  can  be  used  to 
study  resolution  dependencies  in  molecular  recognition. 

To  reveal  a  possibility  to  dock  low-resolution  structures,  we  set  up  an  ultra  large 
grid-step  of  7 A  (much  larger  than  an  atom  radius).  The  Fig.  1  shows  a  typical  cross  section 
through  various  molecule  representations.  As  can  be  seen,  the  high-resolution  discrete 
molecular  image  preserves  many  atom-size  shape  features  of  the  “realistic”  van  der  Waals 
representation.  This  is  not  the  case  of  the  image  obtained  on  a  grid  with  an  ultra  large  step 
(low-resolution  representation).  In  this  case,  it  is  hard  to  find  any  specific  shape  charac¬ 
teristics  at  all.  As  a  consequence,  use  of  the  surface  recognition  algorithm,  which  yields  such 
molecular  images,  doesn’t  produce  any  reasonable,  different  from  noise,  results  (Vakser, 
unpublished  results). 
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Figure  1.  A  cross  section  through  different 
representations  of  the  p-subunit  of  human 
hemoglobin.  In  (a)  the  molecule  is  repre¬ 
sented  by  van  der  Waals  spheres.  In  (b)  and 
(c)  the  same  cross  section  is  shown  for  3D 
grid  projections  with  high-resolution  1.7A 
and  low-resolution  7A  grid-steps  respec¬ 
tively. 


In  order  to  make  the  low-resolution  images  more  informative,  not  breaching  the  main 
assumption  of  no  structure  details  below  the  ultra  large  grid-step  size,  we  modified  the 
procedure  of  projecting  molecules  on  the  grid.  The  details  of  the  modified  algorithm  as  well 
as  its  implementation  are  described  elsewhere  (28).  Briefly,  the  numbers  which  represent  the 
density  of  atoms  within  a  volume  cube  (element  of  the  grid)  were  introduced.  The  density 
numbers  are  naturally  smaller  at  the  molecular  surface  than  inside  the  molecule,  because  at 
the  surface  only  part  of  the  cube  is  occupied  by  atoms.  The  surface  values  tend  to  be  larger 
within  deep  cavities  and  smaller  at  pronounced  convexities.  The  rest  of  the  docking  algorithm 
remains  exactly  as  in  the  basic  procedure. 

As  follows  from  this  modification,  the  dependence  on  atomic  density  is  digitized  on 
a  low-resolution  grid.  Thus,  our  basic  assumption  of  no  structural  features  below  the 
low-resolution  grid-step  still  holds.  To  use  an  analogy  with  regular  vision,  the  atom  density 
modification  means  that  a  subject  with  “low-resolution”  vision  may  distinguish  not  only 
between  black  and  white,  but  also  between  different  colors  as  well  as  their  densities. 


RESULTS 

The  high-resolution  molecular  recognition  algorithms,  at  least  in  theory,  predict  the 
“exact”  position  of  the  ligand  in  complex  with  the  receptor.  It  means  that  the  six  parameters, 
three  translations  and  three  rotations  of  the  rigid-body  ligand,  have  to  be  determined. 
Traditionally,  these  parameters  are  considered  together,  in  one  set,  which  is  quite  justifiable 
for  high-resolution  docking.  Indeed,  when  the  molecules  are  represented  with  high  precision, 
there  is  little  room,  for  example,  to  rotate  the  ligand  after  it  had  been  translated  to  the  correct 
position.  There  are,  however,  certain  indications  of  multiplicity,  at  least  for  predicted  ligand 
rotations  at  the  binding  site  of  the  receptor  (16).  As  our  experience  showed,  this  multiplicity 
increases  dramatically  when  the  low  resolution  is  applied.  Such  increase  of  multiplicity  is 
accompanied  by  a  decrease  in  reliability  of  the  multiple-value  parameter(s)  determination. 
In  other  words,  the  difference  between  the  high-resolution  and  the  low-resolution  occurs  as 
follows.  In  typical  results  of  the  high-resolution  docking,  there  might  be  present  multiple 
false-positive  matches,  the  one  match  which  is  “fully  correct”,  and  few  (or  none)  “partially 
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correct”  matches  (e.g.  those  which  correctly  predict  same  five  out  of  the  total  of  six 
parameters).  In  low-resolution  recognition,  also  along  with  false  positives,  there  are  usually 
no  “fully  correct”  matches  and  multiple  “partially  correct”  matches.  This  characteristic 
feature  of  the  low-resolution  docking  means  that  it  is  reasonable  to  consider  the  six 
parameters  of  the  rigid-body  ligand  translation  and  rotation  separately  in  the  following 
sequence:  three  translations  (the  correct  values  are  equivalent  to  the  prediction  of  the  receptor 
binding  site),  two  angles  of  rotation  which  determine  the  orientation  of  the  ligand  binding 
site  toward  the  receptor  (the  correct  values  are  equivalent  to  the  prediction  of  the  ligand 
binding  site),  and  one  spin  angle  around  the  axis  which  connects  the  two  binding  sites  already 
in  contact  (determines  the  final  “lock”  of  the  two  molecules). 

We  applied  the  procedure  to  different  molecular  complexes  from  the  Brookhaven 
Protein  Data  Bank  (29).  Some  of  the  molecules  are  shown  on  Fig.  2.  The  complexes  were 
selected  so  that,  while  the  “receptor”  was  always  represented  by  a  macromolecule  (more 
than  1000  atoms),  the  size  of  the  “ligand”  varies,  from  medium  (p  subunits  of  human  deoxy- 
and  horse  met-hemoglobins,  lysozyme),  to  small  proteins  (trypsin  and  chymotrypsin  inhibi¬ 
tors,  ovomucoid  third  domain),  and  further  to  peptides  and  tyrosinyl  adenylate  (all  in  the 
receptor-bound  conformation).  From  the  point  of  view  of  their  nature,  the  chosen  examples 
represent  multisubunit  proteins  (hemoglobins),  enzyme-inhibitor  complexes  (trypsin,  chy¬ 
motrypsin,  and  subtilisin  with  the  inhibitors),  antigen-antibody  complexes  (Fab  with 


Figure  2.  Cross  sections  through 
3D  grid  images  of  (a)  human  hemo- 
globin  a  (left)  and  p  (right) 
subunits;  (b)  trypsin  (left)  and  tryp¬ 
sin  inhibitor  (right);  (c)  acid  protei¬ 
nase  (left)  and  its  peptide  inhibitor 
(right).  The  step  of  the  grid  is  7A. 
Different  shades  of  gray  represent 
various  degrees  of  atom  density 
within  the  corresponding  cube  of  the 
grid  for  the  “receptor”  (R)  and  the 
“ligand”  (L).  White  areas  inside  the 
“receptor”  molecules  indicate  nega¬ 
tive  values  (to  avoid  deep  intermo- 
lecular  penetrations,  Refs.  17,28). 
Solid  line  contours  show  the  actual 
position  of  the  ligands  in  the  co¬ 
crystallized  complexes. 
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lysozyme  and  with  a  peptide),  and  such  complexes  as  tRNA  synthetase  -  tyrosinyl  adenylate 
and  the  MHC  I  molecule  with  a  peptide. 

All  results  were  analyzed  in  two  stages,  first  for  the  three  translational  (prediction  of 
the  receptor’s  binding  site)  and  then,  in  configurations  with  the  correct  translation,  for  the 
two  rotational  coordinates  (prediction  of  the  ligand’s  binding  site).  The  axis  P  for  the  third 
angle  of  rotation  was  chosen  between  centers  of  gravity  of  the  entire  ligand  molecule  and 
its  binding  site.  In  cases  when  the  whole  ligand  molecule  is  in  contact  with  the  receptor  (e.g. 
peptides  in  complex  with  aspartic  proteinase  or  MHC)  we  chose  the  main  axis  of  the  ligand. 
No  molecular  complex  showed  any  significant  preference  toward  the  correct  values  of  the 
sixth  coordinate  (spin  angle  around  the  binding  site  axis). 

For  the  human  deoxyhemoglobin,  a  -  p  subunits  (2HHB,  Ref  30),  the  low-resolution 
representation  of  the  molecules  is  shown  in  Fig.  2a.  The  results  of  an  exhaustive  search 
through  all  six  docking  coordinates  (Fig.  3  a)  show  that  the  correct  values  of  the  translational 
coordinates  (translation  of  p  subunit  to  a  subunit)  are  determined  quite  unequivocally.  The 
rotational  coordinates  are  resolved  much  less  distinctively,  however  the  trend  towards  the 
correct  binding  site  orientation  of  the  “ligand”  (p  subunit)  is  clearly  indicated. 

In  case  of  horse  methemoglobin,  a  -  p  subunits  (2MHB,  Ref  31),  the  translation  of 
the  p  subunit  is  predicted  with  even  better  accuracy  than  for  2HHB  (Fig.  3b).  The  orientation 
coordinates  show  a  moderate  trend  to  correct  values. 

The  molecular  images  of  trypsin  and  trypsin  inhibitor  (2PTC,  Ref  32)  are  presented 
in  Fig.  2b.  The  trypsin  binding  site  was  predicted  with  remarkable  accuracy  (Fig.  3c).  All 
first  100  highest  score  positions  had  the  correct  translational  values.  Actually,  the  first 
configuration  with  a  wrong  translation  of  the  ligand  appeared  as  number  274  in  the  sorted 
list  of  predicted  positions.  However,  the  prediction  of  the  ligand  orientation  is  poor,  though 
the  left  part  of  the  distribution  (correct  directions)  slightly  prevails.  Such  non-distinctive 
ligand’s  orientation  may  be  correlated  with  the  strong  results  in  the  translation  prediction. 
Indeed,  the  less  sensitive  are  the  low-resolution  molecular  images  to  the  ligand  orientation, 
the  more  configurations  with  the  correct  translations  and  different  orientations  will  be  found. 

For  chymotrypsin  and  ovomucoid  third  domain  (ICHO,  Ref.  33),  the  “absolute” 
translational  results  (Fig.  3d)  are  similar  to  these  of  the  trypsin  -  trypsin  inhibitor  complex. 
However,  contrary  to  the  case  of  2PTC,  the  orientation  sampling  reveals  a  distinctive  peak 
of  the  correct  values  in  the  angles  distribution. 

The  results  on  subtilisin  -  chymotrypsin  inhibitor  (2SNI,  Ref  34)  give  a  high-quality 
prediction  of  the  subtilisin  binding  site  (Fig.  3e).  However,  not  all  first  100  matches  are 
correct,  as  in  the  case  of  other  enzyme  -  inhibitor  (small  protein)  complexes  (2PTC,  ICHO). 
The  trend  toward  correct  ligand  orientation  is  weak,  as  in  the  case  of  2PTC. 

The  complex  between  acid  proteinase  and  peptide  inhibitor  (3  APR,  Ref  35),  shown 
on  Figure  2c,  yielded  perfect  results  on  the  ligand  translation  (Fig.  3f).  Contrary  to  all 
previous  complexes,  the  orientation  prediction  is  very  good.  The  angle  distribution  has 
certain  symmetry,  where  the  values  near  0-  and  1 80-  strongly  dominate  over  those  around 
90-,  which  clearly  reflects  the  symmetry  of  the  ligand  (an  elongated  peptide).  Of  course,  the 
difference  between  the  C-  and  N-terminals  (0-  and  180-)  at  the  ultra-low  resolution  is 
negligible. 

In  the  case  of  tRNA  synthetase  -  tyrosinyl  adenylate  (3TS1 ,  Ref  36),  the  translational 
and  rotational  distributions  (Fig.  3g)  are  similar  to  those  of  the  3 APR  complex.  Their 
character  is  even  more  pronounced.  Just  as  in  the  previous  case,  the  rotational  distribution 
reflects  the  natural  symmetry  of  the  ligand. 

The  histograms  in  Fig,  3h  show  the  results  of  a  model  9-residue  peptide  docking  on 
the  MHC  I  molecule  (IHSA,  Ref  37).  In  our  computer  experiment,  we  retained  the  al  and 
a2  subunits  which  contain  the  binding  site  on  MHC  I.  The  pattern  of  distributions  is  similar 
to  those  of  3 APR  and  3TS1,  however,  the  results  are  weaker  than  in  both  these  cases.  The 
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Figure  3.  Docking  results  for  different  molecular  complexes  (a-j).  For  each  complex  the  left  panel  represents 
the  number  of  hits  at  various  distances  from  the  correct  ligand  position.  Only  the  first  100  highest  score  hits 
were  analyzed.  Distances  are  shown  in  translation  (grid)  steps  of  7A,  thus  the  0  step  corresponds  to  the  correct 
position  (within  the  ultimate  highest  accuracy  of  7A).  Right  panels  show  relative  distribution  of  ligand 
orientations  for  the  hits  with  the  correct  translations  only.  The  orientation  is  calculated  as  an  angle  between 
the  axis  P  (see  the  text)  of  the  given  and  the  correct  orientations,  in  steps  of  20°,  thus  the  0  step  corresponds 
to  the  correct  orientation  (within  20°  accuracy).  The  left  part  of  the  histogram  (steps  0-3)  corresponds  to  the 
ligand’s  binding  site  orientations  which  prefer  the  receptor  direction. 
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reason  might  be  that  in  3  APR  and  3TS1  complexes  the  ligands  are  deeply  immersed  into  the 
receptor  molecule,  while  in  IHSA  the  receptor  site  is  shallow.  Along  with  matching  the 
peptide  from  IHSA  complex  with  al-a2  subunits  of  MHC,  we  also  tested  docking  modes 
on  the  entire  MHC  molecule.  The  resulting  distributions  (not  shown)  appeared  quite 
disordered.  This  may  be  attributed  to  a  considerably  large  size  and  complicated  surface 
structure  of  the  entire  MHC  I  molecule,  which,  combined  with  the  shallow  character  of  the 
binding  site,  creates  highly  competitive  false  positive  matches  for  the  ligand  at  the  ultra-low 
resolution. 

The  results  on  Fab  fragment  -  lysozyme  (2HFL,  Ref.  38)  complex  (Fig.  3i)  are  quite 
different  from  all  the  previous  ones.  The  ligand’s  translation  was  basically  not  predicted  (a 
small  value  at  the  0-step  is  negligible).  The  same  applies  to  the  rotational  distribution  which 
was  not  very  informative  at  all,  since  it  was  based  on  very  poor  statistics. 

For  Fab  fragment  and  a  peptide  (IGGI,  Ref.  39),  the  docking  of  a  peptidic  ligand 
(instead  of  a  small  protein  in  2HFL)  to  an  antibody  yielded  somewhat  better  results  (Fig. 
3j).  The  translation  part  is  still  much  weaker  than  in  the  rest  of  the  cases.  The  correct 
translations,  however,  are  represented  better  than  in  2HFL.  The  correct  rotational  values 
(including  the  opposite  to  0-,  “symmetric”  part  of  the  distribution)  are  very  distinctive  and 
look  much  more  dominant  than  in  any  other  complex.  The  character  of  both  distributions 
suggests  high  “specificity”  of  the  complex  (small  number  of  configurations  with  correct 
translations  when  most  of  them  have  also  correct  orientations  of  the  ligand),  as  opposed  to 
low  “specificity”  of  complexes  like  2PTC  (large  number  of  configurations  with  correct 
translations  and  different  orientations  of  the  ligand). 


DISCUSSION  AND  CONCLUSIONS 

The  distributions  of  the  ligand  positions  in  the  10  tested  complexes  show  different 
patterns  for  different  groups  of  molecules.  The  complexes  between  medium  size  proteins, 
such  as  subunits  of  hemoglobins,  which  are  characterized  by  large  intermolecular  interfaces 
with  no  distinct  “global”  concavity  at  the  “receptor”  site,  demonstrate  good  translation 
predictions  and  moderate  orientation  preferences.  The  enzyme-inhibitor  complexes,  when 
the  inhibitor  is  represented  by  a  small  protein,  are  characteristic  of  very  strong  translation 
predictions,  which,  at  the  same  time,  are  not  very  specific  to  the  ligand  orientation.  The 
complexes  with  small  oligopeptide-size  ligands  are  also  good  on  the  translation  prediction 
(except  the  antigen-antibody  complex).  In  addition  to  that,  they  demonstrate  an  exceptional 
quality  of  the  orientation  prediction,  which  is  symmetric  due  to  the  similarity  between  the 
two  ends  of  elongated  ligands  at  the  ultra-low  resolution.  The  results  on  the  antigen-antibody 
complexes  tested  are,  in  general,  different  from  the  rest  of  the  cases.  They  are  characterized 
by  very  low  predictions  with  correct  translations  which  are  very  specific  on  the  ligand 
orientation  (in  case  of  the  lysozyme-antibody  complex,  this  feature  might  be  lost  due  to  a 
non-representative  statistics  in  the  rotational  distribution). 

The  translational  and  rotational  histograms  represent  a  convenient  and  simple  tools 
for  examination  of  the  docking  results  at  ultra-low  resolution.  However,  they  are,  by  far,  not 
sufficient  for  more  detailed  analysis.  The  reduction  of  3  dimensional  (translations)  and  2 
dimensional  (rotations)  results  to  single-dimension  representations  helps  to  reveal  very 
clearly  the  existing  trend  to  the  correct  configurations.  At  the  same  time,  it  makes  impossible 
to  employ  such  methods  as  cluster  analysis  of  3D  and  2D  results.  If  such  clusters  exist,  their 
examination  could  contribute  to  a  more  objective  view  of  the  predictions.  For  example,  in 
case  of  the  antigen-antibody  complex  2HFL,  the  predominant  high-score  false-positive 
matches  (Fig.  3i),  if  found  belonging  to  the  same  “wrong”  binding  site,  could  be  reduced  to 
a  single,  most  representative  match,  thus  allowing  the  analysis  of  matches  with  lower  scores. 
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Such  histograms  of  “clusters”  rather  than  the  histograms  of  “individual  matches”  which  we 
used,  might  be  quite  helpful  in  dealing  with  cases  as  2HFL,  when  the  correct  binding  mode 
does  not  dominate.  We  leave  this  analysis  for  the  future,  as  this  present  paper  concentrates 
on  the  principal  trends  in  molecular  recognition  at  ultra-low  resolution.  For  this  purpose,  the 
simple  ID  analysis  helps  to  reveal  the  generalities  which  could  have  been  less  explicit 
otherwise. 

Our  approach  gives  an  important  instrument  for  the  practical  docking  studies  of 
molecules  whose  structures  are  too  uncertain  to  fit  into  high-resolution  docking  procedures. 
The  low-resolution  docking  may  be  successfully  used  as  the  first,  preliminary  stage,  which 
will  be  followed  by  a  “regular”  high-resolution  procedure.  This  may  be  helpful  to  preselect 
potential  areas  of  ligand  binding  in  case  of  local  conformational  changes  upon  complex 
formation  (which  are  of  little  importance  at  ultra-low  resolution).  We  will  also  use  this 
approach  as  a  stand-alone  procedure  for  molecules  with  low-resolution  structures  (e.g.  NMR, 
modeled  structures).  The  resolution  of '-'7 A  is  more  than  enough  to  accommodate  inaccura¬ 
cies  of  many  low-resolution  experimental  and  modeled  structures,  which  will  open  new 
opportunities  for  investigation  of  molecular  mechanisms. 
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INTRODUCTION 

An  important  goal  in  sequence  analysis  is  to  identify  features  in  sequence  alignments 
that  are  reflections  of  the  evolutionary  history  of  the  sequences  rather  than  artifacts  of  the 
alignment  method.  We  have  developed  a  new  measure  for  determining  a  distance  between 
sequence  alignments  that  will  assist  us  in  reaching  this  goal.  We  apply  this  new  measure  to 
alignments  produced  by  three  different  sequence  alignment  programs  and  an  alignment 
created  by  structural  superposition. 

Multiple  sequence  alignments  are  valuable  tools  for  identifying  the  amino  acids 
critical  for  the  structural  and  functional  integrity  of  a  protein.  Alignments  of  homogolous 
proteins  also  succinctly  summarize  the  evolutionary  history  of  the  protein.  In  recent  years, 
several  different  techniques  for  creating  multiple  alignments  of  proteins  have  been  published 
(Barton  and  Sternberg,  1987;  Lipman  et  al,  1989,  Feng  and  Doolittle,  1990).  Implicit  in 
each  alignment  program  are  different  assumptions  about  how  to  treat  the  evolution  of 
sequences  (Altschul,  1989;  Altschul  and  Lipman,  1989).  All  multiple  alignment  techniques 
have  potential  theoretical  shortcomings  (Altschul,  1989;  Altschul  and  Lipman,  1989).  We 
will  explore  how  these  different  assumptions  lead  to  different  alignments  by  applying  our 
new  measure  to  alignments  from  three  multiple  sequence  alignment  programs  and  a  struc¬ 
tural  superposition  alignment. 

Our  new  measure  for  comparing  alignments  has  several  advantages  over  previously 
used  methods.  Most  importantly,  our  measure  is  a  true  metric  that  meets  the  mathematical 
criteria  for  a  distance  between  alignments.  That  this  is  a  true  metric  is  important  because  it 
allows  us  to  compare  several  different  alignment  techniques  simultaneously  (Kruskal,  1983). 
Nonmetric  measures  allow  only  pairs  of  alignments  to  be  directly  compared.  Hence,  if 
several  methods  are  contrasted  with  a  nonmetric  measure,  one  of  the  alignments  must  be 
designated  as  the  standard  alignment,  assumed  to  be  more  reliable,  to  which  the  other 
alignments  are  compared. 
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The  second  major  advantage  of  our  alignment  measure  is  that  it  incorporates  a  more 
quantitative  and  discriminating  assessment  of  the  variations  in  the  size  and  location  of  gaps. 
Methods,  such  as  determining  the  percentage  of  amino  acids  that  are  identically  aligned, 
measure  the  location  and  size  of  gaps  in  a  dichotomous  “the  same  location  and  size”  or  “not 
the  same  location  and  size”  manner.  Our  alignment  measure  makes  a  much  more  quantitative 
and  graduated  measurement  of  gap  placement.  Since  an  alignment  can  be  completely 
described  by  the  locations  and  lengths  of  the  gaps  placed  into  each  sequence,  gap  location 
and  size  is  a  critical  aspect. 

A  third  feature  of  our  proposed  alignment  measure  is  that  it  has  a  straight  forward 
interpretation  that  is  easily  visualized.  The  interpretation  is  that  the  distance  between  two 
alignments  is  the  area  between  the  path  graphs  of  the  two  alignments. 


METHODS  and  DATA 
Multiple  Sequence  Alignments 

The  MSA  (multiple  sequence  alignment)  alignment  was  generated  on  the  Cray-C90 
at  the  Pittsburgh  Supercomputing  Center  with  the  program  from  the  National  Center  for 
Biotechnology  Information  described  by  Lipman  et  al.,  (1989).  The  multiple  sequence 
alignment  was  generated  using  the  PAM  250  (Dayhoff  et  al.  1978)  similarity  matrix 
converted  to  differences  (Smith  et  al.,  1981),  open  gap  costs  equal  to  eight,  and  extend  gap 
cost  of  12  (Altschul,  1989).  These  are  the  default  values  for  these  parameters.  We  selected 
the  option  to  weight  the  sequences  so  that  the  alignment  scores  would  approximate  an 
alignment  scored  by  summing  the  costs  over  the  phylogenetic  tree  estimated  by  the  program. 
We  selected  the  optimal  alignment  rather  than  the  heuristic  alignment  and  elected  to  weight 
end-gaps. 

The  AMPS  (alignment  of  multiple  protein  sequences)  alignment  was  generated  by 
the  VAX  version  of  the  program  initially  described  by  Barton  and  Sternberg  (1987).  The 
alignment  was  generated  with  the  PAM  250  similarity  matrix  and  a  gap  score  of  eight.  The 
AMPS  program  does  not  use  a  separate  penalty  for  extending  gaps.  Thus  in  the  AMPS 
alignment  each  gap  is  penalized  the  same  cost  regardless  of  its  length.  Barton  (1990)  has 
found  this  to  be  an  effective  strategy.  The  order  in  which  sequences  were  joined  in  generating 
the  alignment  was  computed  by  the  program  using  normalized  alignment  scores  from  one 
hundred  randomizations  of  the  sequences.  We  used  the  option  to  generate  the  alignment 
guided  by  a  phylogenetic  tree  rather  than  a  single  order  alignment  (Barton,  1990). 

The  PileUp  alignment  was  generated  by  the  PileUp  program  included  release  7  of 
the  Genetics  Computer  Group  suite  of  programs  (Genetics  Computer  Group,  1991).  Again 
we  used  the  PAM  250  similarity  matrix  to  generate  the  alignments.  The  open  gap  penalty 
was  minus  eight  and  the  extend  gap  penalty  was  also  minus  eight.  The  rational  for  this  choice 
was  to  set  the  total  costs  for  a  gap  of  two  amino  acids  equal  in  both  the  PileUp  and  MSA 
programs.  An  exact  equivalence  for  all  gap  lengths  is  not  possible  since  PileUp  uses 
similarity  scoring  and  MSA  uses  distance  scoring  (Smith  et  al.,  1981).  This  allows  us  to 
gather  information  on  the  consequences  of  this  difference  in  gap  penalty  functions  between 
the  two  programs.  The  option  to  weight  end-gaps  was  also  chosen  to  remain  consistent  with 
both  MSA  and  AMPS.  Weighting  end-gaps  is  the  theoretically  correct  choice  for  the  aspartyl 
protease  sequence  used  in  this  study. 

The  structural  superposition  alignments  were  in  an  alignment  database  made  avail¬ 
able  by  Pascarella  and  Argos  (1992)  on  the  EMBL  file  server.  We  edited  the  alignments  to 
replace  the  isolated  insertions  that  Pascarella  and  Argos  had  removed  from  the  listing  after 
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the  alignment  was  initially  generated.  These  changes  were  necessary  to  make  the  comparison 
with  the  other  alignments  on  the  same  basis. 


EXAMPLE  SEQUENCE  ALIGNMENTS 

We  will  apply  the  alignment  measure  to  four  multiple  sequence  alignments  of  six 
aspartyl  proteases.  Table  1  identifies  the  sequences  and  summarizes  the  alignment  of  each 
pair  of  aligned  sequences.  The  aligned  pairs  of  aspartyl  proteases  range  from  twenty- five  to 
forty-three  percent  identical.  We  expect  sequences  with  this  level  of  similarity  to  be  able  to 
select  each  other  as  related  sequences  when  used  as  a  query  in  a  database  search.  However, 
this  level  of  diversity  is  great  enough  that  alignment  of  the  sequences  is  challenging. 

A  fragment  from  the  four  multiple  sequence  alignments  of  the  six  aspartyl  proteases 
(Figure  1)  was  selected  to  illustrate  many  of  the  uncertainties  that  arise  with  current  multiple 
sequence  alignment  methods.  Within  this  fragment  there  are  five  amino  acids  that  all  four 
alignments  identify  as  completely  conserved.  Another  five  amino  acids  are  identified  as 
completely  conserved  in  two  or  more  of  the  alignments.  We  postulate  that  these  residues 
should  be  aligned  and  accepted  as  completely  conserved. 


Table  1.  Comparative  statistics  for  six  aspartyl  proteases 


Carp_Y 

Pepc_M 

Chym_B 

Catd_H 

Carp_R 

Pepa_A 

40% 

35% 

42% 

34% 

26% 

Carp_Y 

329 

57% 

61% 

51% 

60% 

65% 

2% 

3% 

5% 

5% 

8% 

135 

43% 

40% 

29% 

28% 

Pepc_M 

190 

328 

52% 

53% 

64% 

62% 

7 

3% 

6% 

5% 

9% 

118 

145 

43% 

32% 

25% 

Chym_B 

203 

175 

323 

49% 

63% 

68% 

10 

11 

7% 

3% 

5% 

149 

141 

151 

29% 

25% 

Catd  H 

179 

185 

171 

348 

61% 

63% 

19 

22 

25 

9% 

10% 

114 

99 

107 

103 

33% 

Carp_R 

204 

217 

210 

217 

325 

60% 

17 

20 

13 

32 

5% 

89 

97 

84 

92 

112 

Pepa_A 

224 

214 

230 

225 

203 

325 

28 

31 

20 

37 

19 

Each  cell  in  the  table  has  three  numbers  that  are  identities  at  the  top,  differences 
in  the  middle,  and  gaps  at  the  bottom  in  the  alignment  indicated  by  the  row  and 
column  headings.  Numbers  to  the  lower  left  of  the  diagonal  and  on  the  diagonal 
the  number  of  amino  acids  in  these  categories.  Numbers  above  and  to  the  right 
of  the  diagonal  are  percentages.  Percentages  may  not  add  to  one  hundred 
because  of  rounding  errors.  Numbers  on  the  diagonal  are  the  lengths  of  the 
sequences.  The  statistics  were  computed  by  the  MALIGNED  multiple  sequence 
alignment  editor  (Clark,  1992).  The  complete  Swiss-Prot  identifier  and 
accession  number  corresponding  to  each  of  the  six  abreviated  aspartyl  protease 
names  are:  Carp_Y  =  (Carp_Yeast,  P07267);  Pepc_M  =  (Pepc_Macfu,  P03955); 
Chym_B  =  (Chym_Bovin,  P00794);  Catd_H  =  (Catd_Human,  P07339); 
Carp_R  =  (Carp_Rhich,  P06026);  and  Pepa_A  =  (Pepa_Aspsw,  PI 7946).  Any 
signal  and  propeptide  fragments  were  removed  from  the  sequence  in  the 
database  so  that  only  the  sequences  for  the  mature  enzymes  were  aligned. 
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Structural  Superposition 

***  $  $  # 

Carp_Y  YGt  .gsleGyisqDtlsi . gdltipkQdfaeatsepgltFafg.  kfDG 

YGsgs .  ItGf  fgyDtltv . qsiqvpnQefglsenepgtnFvya  .qfDG 

Chym_B  YGtgs  .mqGilgyDtvtv . snivdiqQtvglstqepgdvFtya  .efDG 

Catd_H  YGs .gslsGylsqDtvsvpcqsassasalggvkverQvfgeatkqpgitFiaa . kfDG 

Carp_R  YGdgssasGilakDnvnl . ggllikgQtielakreaa .  sFasg  .pnDG 

Pepa_A  YGdgssasGdvyrDtvtv . ggvttnkQaveaaskiss  .eFvqntanDG 


MSA 

Carp_Y 

Pepc_M 

Chym_B 

Catd_H 

Carp_R 

Pepa_A 


**  $  $  # 

YGtGS .  leGyisqDtisi . gdltipkQdf aeatsepgltFa .  fgkf DG 

YGsGS .  ItGf  fgyDtltv . qsiqvpnQef  glsenepgtnFv .  yaqf  DG 

YGtGS  .mqGilgyDtvtv . snivdiqQtvglstqepgdvFt .  yaef DG 

YGsGS . IsGylsqDtvsvpcqsassasalggvkverQvf geatkqpgitFi . aakf DG 

YGdGSsasGilakDnvnl . ggllikgQtielakreaa .  sFa . ngpnDG 

YGdGSsasGdvyrDtvtv . ggvttnkQaveaaskiss  .eFvqntanDG 


AMPS 

Carp_Y 

Pepc_M 

Chym_B 

Catd_H 

Carp__R 

Pepa_A 


**  $  $  # 

YGtGS.  leGyisqDtlsig . dl tipkQdfaeatsepgltf afgkf DG 

YGSGS.  ItGf fgyDtltvq . siqvpnQefglsenepgtnfvyaqf DG 

YGtGS  .mqGilgyDtvtvs . nivdiqQtvglstqepgdvf  tyaef  DG 

YGSGS . IsGylsqDtvsvpcqsassasalggvkverQvfgeatkqpgitf iaakf DG 

YGdGSsasGilakDnvnlg . gllikgQtielakreaa .  sf  angpnDG 

YGdGSsasGdvyrDtvtvg . gvttnkQaveaaskissefvqntanDG 


PileUp 

**  $  $  # 

Carp_Y  YGtGSlegyisqdtlsi . gdltipkQdf  aeatsepgltf  afgkf  DG 

Pepc__M  YGsGSltgffgydtltv . qsiqvpnQefglsenepgtnfvyaqf  DG 

Chym_B  YGtGSmqgilgydtvtv . snivdiqQtvglstqepgdvftyaefDG 

Catd_H  YGsGSlsgylsqdtvsvpcqsassasalggvkverQvfgeatkqpgitf iaakf DG 


YGdGS . sasgilakdnvnlggllikgQtielakreaa.  sf  angpnDG 

Pepa_A  YGdGS . sasgdvyrdtvtvggvttnkQaveaaskissefvqntanDG 


Figure  1.  Multiple  sequence  alignments  of  six  aspartyl  proteases. 


We  will  discuss,  in  turn,  each  of  these  five  amino  acids  that.  One  factor  that  will 
emerge  is  that  the  differences  can  be  succinctly  and  efficiently  described  in  terms  of  the 
differences  in  the  placements  of  small  gaps  by  the  different  alignment  methods.  This  suggests 
that,  while  identifying  homologous  amino  acids  in  related  proteins  reveals  the  biochemically 
important  features,  the  alignment  process  itself  may  be  fruitfully  understood  in  term  of  the 
placement  of  gaps. 

Glycine  Serine  Dipeptide 

We  will  first  consider  the  Glycine  Serine  (GS)  dipeptide  marked  with  asterisks  above 
the  alignments.  The  three  multiple  sequence  alignment  programs  that  base  their  alignments 
entirely  on  phylogenetic  or  sequence  information  align  this  dipeptide  as  completely  con¬ 
served,  while  the  structural  superposition  alignment  does  not.  The  two  unaligned  GS 
dipeptides  in  the  structural  superposition  alignment  have  a  single  amino  acid  gap  placed  in 
a  different  location  from  the  gap  placed  in  the  other  three  alignments.  Examination  of  the 
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three  dimensional  structures  associated  with  these  sequences  indicates  that  the  gap  placement 
is  likely  an  artifact  of  the  structural  superposition  alignment  method  rather  than  an  accurate 
reflection  of  the  evolutionary  history  of  these  sequences. 

In  all  cases  the  GS  dipeptide  is  the  first  two  amino  acids  after  a  p  Turn  in  a 
sheet-hairpin  turn-sheet  motif  (see  Brookhaven  Protein  Data  Bank  entries  Icms,  2apr,  and 
4pep,  Abola  et  aL,  1987).  The  hairpin  turn  is  either  two  or  three  residues  long  with  the  first 
two  residues  being  a  p  Turn.  Thus,  the  GS  is  either  the  first  two  residues  of  the  sheet  (when 
the  hairpin  turn  is  two  residues  long)  or  the  GS  is  the  last  residue  of  the  hairpin  turn  and  the 
first  residue  of  the  sheet  (when  the  hairpin  turn  is  three  residues  long).  The  structural 
superposition  alignment  algorithm  forces  the  hairpin  turns  to  be  the  same  length;  thus,  for 
the  shorter  hairpin  turn,  a  gap  is  inserted.  The  residues  in  the  sheet-hairpin  turn-sheet  are  on 
the  surface  of  the  protein.  Rather  than  an  insertion  in  the  hairpin  turn,  it  appears  that  the 
second  strand  of  sheet  has  shifted  one  position  because  of  steric  interactions  with  core 
residues  of  the  protein. 

We  believe  that,  in  this  instance,  the  sequence  based  alignments  more  accurately 
represent  the  evolutionary  history  of  the  protein  than  does  the  structural  superposition 
alignment. 

Glycine  Aspartic  Acid  Region 

The  dollar  signs  in  figure  1  mark  two  amino  acids,  a  Glycine  and  an  Aspartic  acid, 
that  are  found  to  be  conserved  in  all  the  alignments  except  the  PileUp  alignment.  While  it 
may  appear  that  the  eleven  amino  acid  insertion  in  the  Catd_H  sequence  is  somehow 
responsible  for  misleading  the  PileUp  program  relative  to  the  other  programs,  this  is  probably 
not  the  cause. 

The  probable  cause  of  this  difference  is  the  PileUp  gap  function  which  contains  both 
a  penalty  for  opening  a  new  gap  and  a  penalty  for  extending  the  gap.  This  form  of  the  gap 
penalty  explicitly  makes  short  gaps  less  likely.  This  bias  against  short  gaps  prevents  PileUp 
from  inserting  a  single  amino  acid  gap  either  before  or  after  the  conserved  GS  dipeptide 
which  is  inserted  by  all  the  other  programs.  The  MSA  program  inserted  the  single  amino 
acid  gap  after  the  conserved  GS  even  though  it  also  used  a  gap  function  that  includes  penalties 
for  both  opening  and  extending  gaps.  The  difference  in  these  gap  functions  is  that  MSA 
weights  the  contributions  from  each  pair  of  sequence  so  that  the  alignment  scores  approxi¬ 
mate  alignment  scores  summed  over  a  phylogenetic  tree  rather  than  simple  sum-of-pairs 
alignment  score.  This  effectively  reduces  the  cost  of  introducing  this  gap. 

Based  on  these  considerations,  we  believe  that  the  alignments  containing  as  con¬ 
served  both  the  Glycine  and  the  Aspartic  acid  are  more  likely  to  represent  the  actual  evolution 
of  these  aspartyl  proteases. 

Phenylalanine 

The  final  case  we  will  examine  is  the  pound  sign  marking  a  Phenylalanine  that  is 
completely  conserved  in  the  structural  superposition  and  the  MSA  alignments  but  not  in  the 
alignments  generated  by  the  progressive  pairwise  alignment  programs,  AMPS  and  PileUp. 
Maintaining  this  Phenylalanine  as  completely  conserved  would  require  these  two  programs 
to  introduce  a  pair  of  new  gaps.  One  new  gap  is  required  before  the  Phenylalanine  in  the 
Pepa_A  sequence  to  bring  the  Phenylalanines  into  alignment.  The  second  gap  is  required 
after  the  Phenylalanine  in  all  the  sequences  other  than  the  Pepa_A  sequence  to  keep  the 
conserved  Aspartic  acid  Glycine  dipeptide  at  the  C  terminal  end  of  the  fragment. 

This  case  is  more  difficult  to  access  than  the  previous  two.  There  are  several  factors 
that  argue  in  favor  of  accepting  the  conservation  of  the  Phenylalanine.  First,  we  are  inclined 
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to  give  a  heavy  weight  to  the  structural  superposition  alignment  when  it  identifies  as 
homologous  large  hydrophobic  amino  acids,  as  it  does  here.  Second,  this  arrangement  is  a 
commonly  observed  local  minimum  encountered  in  progressive  pairwise  alignments. 

In  both  the  AMPS  and  the  PileUp  alignments,  adding  the  Pepa_A  sequence  to  the 
alignment  of  the  other  five  sequences  was  the  final  step  in  the  alignment  process.  Aligning 
the  Pepa_A  Phenylalanine  with  the  already  aligned  Phenylalanines  required  opening  two 
gaps,  one  in  the  Pepa_A  sequence  and  one  in  the  preexisting  five  sequence  alignment.  This 
gap  barrier  was  too  large  for  the  pairwise  programs  to  overcome. 

We  tested  this  hypothesis  by  aligning  the  Pepa_A  sequence  with  the  other  sequences 
individually  (alignments  not  shown).  In  these  alignments  the  gaps  necessary  to  align  the 
Pepa_A  Phenylalanine  with  the  other  Phenylalanines  were  inserted.  Thus  had  one  of  these 
pairwise  alignments  been  the  first  step  in  the  AMPS  or  PileUp  progressive  alignment 
procedure  all  the  Phenylalanines  would  have  been  aligned. 

Three  factors  in  the  MSA  alignment  program  allow  it  to  overcome  this  gap  penalty 
barrier.  The  first  is  the  weighting  of  the  sequences.  The  second  and  prehaps  more  important 
factor  is  that  the  MSA  program  has  a  much  improved  rationale  for  counting  gap  cost  over 
the  AMPS  and  PileUp  programs  (Altschul,  1989).  The  third  factor  is  that  MSA  aligns  all  the 
sequences  simultaneously  rather  than  in  a  progressive  manner.  All  three  factors  make  MSA 
theoretically  superior  to  both  AMPS  and  PileUp  and  thus  expected  to  give  more  biologically 
reasonable  alignments.  These  factors  argue  in  favor  of  accepting  the  MSA  alignment  over 
the  AMPS  and  PileUp  alignments. 

Based  on  all  these  factors  we  believe  that  the  Phenylalanine  should  be  accepted  as 
completely  conserved. 

Summary 

The  variability  in  the  alignments  of  these  moderate  length  fragments  is  representative 
of  the  variability  in  alignments  that  would  be  seen  in  many  of  the  families  of  proteins  that 
biochemists  and  molecular  biologists  are  actively  studying.  Had  we  included  alignments 
generated  with  different  similarity  matrices  and  with  different  gap  penalties,  the  variability 
in  the  alignments  would  have  been  even  greater.  To  understand  how  the  choice  of  alignment 
algorithms,  similarity  scores,  and  gap  penalties  effects  the  alignment  of  the  sequences  we 
need  to  accurately  measure  the  differences  among  alignments  that  result  from  changes  in  the 
different  factors  contributing  to  the  alignment. 


DISTANCES  BETWEEN  ALIGNMENTS 
Computing  the  Distance 

Comparing  more  than  two  different  alignments  of  the  same  sequences  simultane¬ 
ously  requires  a  measure  that  satisfies  the  mathematical  definition  of  a  distance  (Kruskal, 
1983)  To  be  useful,  the  measure  must  also  capture  the  important  differences  between 
alignments  in  a  manner  that  is  easily  understood  and  visualized.  We  have  developed  a 
measure  that  determines  a  distance  between  sequence  alignments  that  satisfies  these 
criteria. 

To  describe  the  alignment  of  a  pair  of  sequences  Aand  7,  the  amino  acids  in  sequence 
Xare  denoted  as  jc^  while  those  in  sequence  V  are  denoted  as  yj.  Pairs  of  aligned  amino  acids 
in  the  alignment  are  denoted  as  x.-iyj  When  gaps  are  introduced  into  sequence  X  of  the 
alignment  following  amino  acids  the  index  j  is  incremented  to  denote  amino  acids  in 
sequence  Y  that  are  aligned  with  a  gap  character,  while  the  index  i  is  not  incremented  until 
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the  gap  has  ended.  Thus  a  two  amino  acid  gap  in  sequence  X  and  its  surrounding  ungaped 
aligned  amino  acids  would  denoted  as:  xf.yj  xf.yj+j^  •^/+/’Ty+5. 

With  this  notation  we  can  write  a  simple  summation  that  will  yield  the  distance 
between  two  alignments.  We  denote  alignment  A  j  on  sequences  X  and  Y  as  having  amino 
acid  juxtapositions  xf-yj  and  denote  alignment  ^2  on  sequences  X  and  Y  as  having  amino  acid 
juxtapositions  Then,  if  sequence  X  \s  N  amino  acids  long,  the  distance  between 
alignments  and  A2,  'Dai:A2^  is  given  by: 

/=I 

Each  term  in  the  summation  is  the  distance  along  sequence  Y  between  amino 
acids  yj  and  the  amino  acids  aligned  with  amino  acid  x^  in  sequence  X  of  the  two 
alignments  being  compared.  Since  Da]:A2  is  a  distance,  Da]:A2  is  equal  to  Da2:a1'  This 
distance,  Dai:A2^  can  be  easily  visualized  in  terms  of  path  graphs  of  the  alignments 
(Figure  2).  The  path  graph  of  an  alignment  of  two  sequences  is  a  rectangular  plot 
with  sequence  X  listed  along  the  horizontal  axis  and  sequence  Y  listed  along  the 
vertical  axis.  This  plot  is  composed  of  square  cells,  one  for  each  pair  of  amino  acids 
Xi  and  yj  .  Each  cell  is  filled  in  one  of  four  possible  ways.  If  the  pair  of  amino  acids 
Xi  and  yj  are  aligned  with  each  other  in  the  alignment  (either  a  match  or  mismatch) 
the  cell  is  filled  with  a  diagonal  line.  If  either  of  the  amino  acids  x,  and  yj  is  part  of 
a  gap  (an  insertion  or  deletion)  the  cell  is  filled  with  a  vertical  line  if  the  gap  is  in 
sequence  or  a  horizontal  line  if  the  gap  is  in  sequence  Y.  If  the  pair  of  amino  acids 
Xi  and  yj  are  not  part  of  the  alignment  the  cell  is  empty.  Figure  2  shows  path  graphs 
for  the  alignments  of  the  fragments  Catd_H  with  Pepa__A  generated  by  the  MSA  and 
PileUp  programs  and  shown  in  Figure  1. 


Figure  2.  Path  graphs  for  the  MSA  and  PileUp  alignments. 
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Table  2.  Distances  between  alignments  by  different 
programs 


(a) 

Alignments  containing  Carp_Y 

Structure 

MSA 

AMPS 

PILEUP 

Structure 

Max 

1194 

1143 

1385 

Min 

— 

81 

87 

97 

Avg 

— 

951.2 

915 

986 

Median 

— 

1156 

1108 

1099 

MSA 

Max 

1155 

— 

68 

197 

Min 

81 

— 

1 

10 

Avg 

923.4 

— 

31 

88.4 

Median 

1130 

— 

25 

57 

AMPS 

Max 

1141 

77 

— 

171 

Min 

87 

1 

— 

15 

Avg 

909.8 

24.4 

— 

85 

Median 

1103 

18 

— 

55 

PileUp 

Max 

1227 

251 

237 

— 

Min 

97 

10 

15 

— 

Avg 

944.8 

120.6 

121.6 

— 

Median 

1145 

74 

75 

— 

Alignments  containing  CatdJHf 

(b) 

Alignments  containing  Pepc_M 

Structure 

MSA 

AMPS 

PILEUP 

Structure 

Max 

— 

1156 

1126 

138.5 

Min 

— 

64 

105 

170 

Avg 

- — 

507 

530.4 

749.6 

Median 

115 

177 

445 

MSA 

Max 

1138 

— 

70 

241 

Min 

75 

— 

12 

28 

Avg 

509.6 

— 

33.8 

115.6 

Median 

115 

— 

27 

74 

AMPS 

Max 

1108 

83 

218 

Min 

61 

68 

— 

5 

Avg 

507 

74.5 

— 

102.2 

Median 

111 

76 

___ 

75 

PileUp 

Max 

1146 

251 

237 

— 

Min 

66 

102 

102 

— 

Avg 

619.6 

202.8 

187.8 

— 

Median 

521 

223 

211 

— 

Alignments  containing  Pepa_A 
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Table  2.  Continued 


(c) 

Alignments  containing  Chym_B 


Structure 

MSA 

AMPS 

PILEUP 

Structure 

Max 

— 

1187 

nil 

1285 

Min 

— 

30 

33 

170 

Avg 

— 

508.4 

486.8 

401.8 

Median 

— 

115 

105 

288 

MSA 

Max 

1194 

— 

83 

223 

Min 

30 

. — 

18 

28 

Avg 

504.8 

— 

37.2 

98.8 

Median 

75 

— 

27 

58 

AMPS 

Max 

1143 

76 

— 

211 

Min 

33 

33 

— 

5 

Avg 

500.6 

52.8 

— 

94.4 

Median 

125 

46 

— 

67 

PileUp 

Max 

1109 

210 

214 

— 

Min 

66 

102 

102 

— 

Avg 

594.4 

154.2 

151 

— 

Median 

445 

144 

143 

— 

Alignments  containing  Carp_R 


Table  2  is  composed  of  six  triangular  subtables  arranged  in 
pairs  as  the  upper  right  and  lower  left  triangles  of  a  square 
table.  The  three  square  tables  are  the  sub  tables  of  the  complete 
table.  Each  triangular  subtable  contains  summaries  of  the 
distances  between  alignments  generated  by  all  pairs  of  the 
alignment  programs  used  in  this  study.  Each  summary  is  based 
on  the  alignment  of  a  single  aspartyl  protease  sequence  with 
each  of  the  other  five  aspartyl  protease  sequences  in  the  data. 
The  single  aspartyl  protease  sequence  present  in  all  five 
alignments  of  a  subtable  is  indicated  above  the  upper  right 
subtables  or  below  the  lower  left  subtables.  The  summary 
includes  the  maximum  distance,  the  minimum  distance,  the 
average  distance,  and  the  median  distance.  Giving  both  the 
average  and  median  provides  information  on  the  asymmetry 
of  the  distribution  of  distances. 


The  distance,  is  the  count  of  the  number  of  cells  between  the  two  path  graphs 

plotted  in  the  same  rectangular  plot.  Thus  Djj  .^2^  is  easily  conceptualized  and  visualized  as 
the  area  between  the  path  graphs.  For  the  path  graphs  shown  in  Figure  2  the  distance, 
DMSA:PileUpj  is  146. 

Application  of  the  Metric  to  Aspartyl  Protease  Alignments 

The  entire  sequences  of  the  six  aspartyl  proteases  in  the  data  were  aligned  by  the 
MSA  program,  the  AMPS  program,  the  PileUp  program,  and  structural  superposition.  The 
distances  between  every  pair  of  sequence  and  every  pair  of  programs  was  computed.  These 
distances  are  summarized  in  2  in  six  largely  independent  subtables.  Each  subtable  summarizes  five  alignments  and 
shares  one  pair  of  aligned  sequences  with  each  other  subtable. 
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Examination  of  the  subtables  shows  a  consistent  pattern  of  distances  between  the 
different  methods  of  alignment.  The  MSA  and  AMPS  programs  consistently  give  the  most 
similar  alignments.  The  largest  difference  is  between  the  structural  superposition  alignments 
and  the  alignments  of  any  of  the  three  sequence  aligning  programs.  In  five  of  the  six 
subtables,  PileUp  alignments  are  more  different  from  the  structural  superposition  alignments 
than  are  the  MSA  and  AMPS  alignments. 

Based  on  these  results  and  the  detailed  analysis  of  the  sections  of  alignments  in  figure 
1  we  hypothesize  that  in  differences  the  gap  function  (i.e.,  the  presence  of  gap  opening  and 
gap  extending  terms,  or  how  gaps  are  counted)  play  a  critical  role  in  the  variations  observed 
among  the  methods  of  generating  alignments.  This  hypothesis,  based  on  a  small  data  set 
from  one  protein  family,  is  very  tentative.  However  it  provides  a  basis  for  further  experi¬ 
mentation. 


CONCLUSIONS 

We  have  developed  a  new  measure  for  determining  the  distance  between  alignments. 
This  new  measure  has  significant  advantages  over  previously  used  measures  (e.g.,  the 
percentage  of  identically  aligned  amino  acids).  The  most  important  feature  of  this  measure 
is  that  it  meets  the  mathematical  criteria  for  being  a  distance.  This  allows  us  to  legitimately 
compare  several  different  alignments  simultaneously  and  eliminates  the  need  to  declare  one 
of  the  alignments  as  a  standard.  A  second  important  feature  of  our  measure  is  that  it  captures 
the  influence  of  both  the  size  and  locations  of  gaps  in  the  sequences  in  a  quantitative  and 
graduated  manner  rather  than  on  a  binary  identical  or  not  identical  scale. 

These  features  have  been  gained  without  introducing  confusing  obscurity  or  abstract¬ 
ness  into  the  discussion  of  alignments;  rather,  the  measure  has  a  simple  interpretation  that 
is  readily  visualized  and  displayed  graphically.  The  graphical  display,  in  fact,  enhances  the 
discussion  of  alignment  differences  by  hiding  the  overwhelming  detail  of  specific  amino 
acids  while  illuminating  the  different  patterns  of  gaps  along  the  sequences.  In  terms  of  the 
sequences,  each  element  in  the  summation  computing  the  measure  is  simply  the  distance 
between  the  two  different  amino  acids  in  the  second  sequence  that  are  aligned  with  the  same 
amino  acid  in  the  first  sequence  in  the  alignment.  The  total  measure  is  the  sum  of  all  of  those 
distances.  Another  advantage  is  that  our  alignment  distance  has  an  obvious  and  direct 
extension  to  directly  comparing  alignments  of  more  than  a  pair  of  sequences. 

This  measure  of  alignment  distance  provides  a  useful  tool  for  investigating  all  aspects 
sequence  alignment.  Even  on  this  small  data  set  it  allows  us  to  show  that  sequence  based 
alignment  methods  give  appreciably  different  results  from  methods  based  on  structural 
superposition. 
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horse  heart  cytochrome  c,  1 14 
RNase  A,  1 15 

yeast  enolase  applied  to  PVDF,  113 
of  proteins  applied  to  Zitex  membranes,  1 19- 
129, 131-138 

cleavage  reaction  of  the  peptidylthiohydan- 
toin,  121 

C-terminal  coupling,  121,  133 
cyclization  reactions,  121 
HPLC  analysis  of  thiohydantoin  amino  acid 
derivatives,  121-123,  133 
of  beta-lacto globulin  A,  126 
of  hemoglobin  A  chain,  125 
of  human  serum  albumin,  127 
of  ovalbumin,  137 
of  polyproline,  136 
of  superoxide  dismutase,  124 
sample  application,  120,  133 
reaction  mechanism  of,  97—1 03 
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C-terminal  sequencing  {cont.) 
reaction  methods  of,  90-97,  1 3 1 
thiocyanate  method,  1 3 1 
thiohydantoin  110,  131 
trimethylsilanolate,  131 
using  perfluoroacyl  anhydrides  vapor,  89-1 07 
using  the  Hewlett-Packard  sequencing  system, 
119-129 

sequencing  criteria,  120 
with  MS  detection,  154—155 
Conotoxin,  157 

Database  analysis  {see  sequence  analysis) 
Disulfide  bonds,  152,  156-157 
assignment  in  C9  2 1 1 
intramolecular  rearrangements,  484--488 
mass  spectroscopy  of  proteins  containing,  156-157 
reduction  of,  485^86,  489-^90 
synthetic  a-bungarotoxin  peptides  containing, 
313 

Edman  degradation,  57,  81 
manual,  50,  52 

of  cross-linked  ribosomal  peptides,  279 
of  cyanogen  bromide  BSA  fragment,  44 
of  50  pmol  of  p-lactoglobulin,  41,  42,  43 
separation  of  products  from,  52 
solid  phase,  72 

covalent  attachment  in,  72 
Edman  reagents 

colored,  water  soluble  {see  S-DABITC)  39-45 

3 -  [4’ (ethylene-N,N,N-trimethyl amino)  phenyl] - 

2-  isothiocyanate  (PEIMA-PITC)  58, 
59,  60 

phenyl  isothiocyanate,  57,  58,  66 

4- (3-pyridinylmethylamino)  carboxypropyl 

PITC  (PITC  311)  6,  59,  60,  61,  62,  65, 
66 

Electrotransfer  of  protein  and  peptides 
electroblotting,  5,  84 
on  membrane  digestion,  163 
Epidermal  growth  factor,  154-155 

Filamentous  bacteriophage  fd 
assembly  of,  343-351 
DNA  packaging,  345 
DNA-protein  interactions  in,  343-351 
foreign  peptides,  346—350 
accessibility  of,  349-350 
immunological  properties  of,  347—348 
structural  mimicry  of,  348-349 
surface  display  of,  346-347 
limited  proteolysis  of,  349 
mutagenesis  of  major  coat  protein  of,  344 
protein-protein  interactions  in,  343-35 1 
X-ray  fiber  diffraction  analysis  of,  345 
Fluorescence  methods 

for  monosaccharide  determination,  195-204 
for  sialic  acid  determination,  204-205 


GABAa  receptor 

acute  application  of  agonists,  386 
chronic  exposure  to  agonists,  382—385 
degradation  of,  386 

effect  on  ^H-flunitrazepam  binding,  385 
effect  on  ligand  binding,  384 
effect  on  receptor  surface  density,  384 
internalization  from  the  neuronal  surface,  381-388 
labeling  with’^Sj.ppggt^  382-383 
Gel  electrophoresis,  152,  155,  156 
agarose  gel,  18,  329,  330 
characterization  of  proteins  separated  by,  3—14, 
15-26,  446-447 

differential  mobilities  of  proteins  in,  7-9 
in-gel  concentration  of  proteins,  1 7—24 
in-gel  proteolysis,  20,  30 
one-D  or  2-D  gels,  15-26,  162-163,  298,  301, 
446-447 

protein  isolation  from  2D  gels  for  MS,  155-156 
SDS-polyacrylamide,  16,  29 
Gly copeptides  and  glycoproteins 
carbohydrate  composition,  195 
characterization  of,  197-198 
desialylation  of,  7 1 

p-galactosidase  digestion  of,  71,  78—79 
hydrolysis  of,  196 
monosaccharides,  195 

analysis  of  derivatives  by  HPLC,  196,  198- 
201 

anthranilic  acid  derivatives  of,  196,  200,  202 
determination  in,  195-204 
oligosaccharides,  195 
preparation  from  bovine  K-casein,  7 1 
sialic  acid  determination  in,  195,  197,  199,  203— 
205 

tryptic,  preparation  of,  7 1 

HMG  proteins,  175—183 

alignment  of  peptides  of,  1 82 
mass  mapping  of  peptides  of,  179,  180,  181 
N-terminal  acetylation  of,  175 
preparation  of,  176 
proteolysis  of,  176,  179 
HPLC 

monosaccharide  determination  by,  195—204 
anthranilic  acid  derivatives  of,  196 
narrowbore  reverse  phase,  in 
microsequencing,  16 
of  cross-linked  ribosomal  proteins,  279 
of  thiohydantoin  amino  acid  derivatives 
of  PTH-amino  acids,  57 
121-123,  133 

of  transducin  tryptic  peptides,  235 
on  line,  72 

reverse  phase,  purification  of  peptides,  5,  27, 
163,  168-171 

sialic  acid  determination  by,  197,  204-205 
o-phenylenediamine  derivatives  of,  197 
quinoxiline  derivatives  of,  204—205 
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Human  leukemic  lymphocyte  (MOLT4) 
cell  culture  of,  296-297  . 
exposure  to  carcinogenic  Cr(VI)  296-297 
proteins  of,  295-307 

amino  acid  sequence  of,  298,  302 
electroblotting  of,  298-299 
DNA-complexes  of,  297,  299 
analysis  by  2D  gels,  298,  301 
effect  of  Cr(VI)  on,  299 
isolation  of,  297,  298 
stability  of,  297-298,  299,  300 
Human  DNA-activated  protein  kinase 
phosphorylation  by,  398^03 
of  p53  peptide  substrates,  400 
of  synthetic  peptides,  399 
of  POU  domain,  401 
of  RNA  polymerase  peptide,  402 
structure  of,  396-397 
substrate  specificity  of,  397^03 
Human  fibrinogen 

common  inherited  variants,  44 1 
function,  436,  437 
heterogeneity,  436,  437,  438 

posttranslational  modification,  438 
non-inherited  variants,  438^41 
alternative  processing,  438 
cyclization  of  N-terminal  glutamine,  439 
deamidation,  440 
glycosylation,  440 
methionine  oxidation,  439 
phosphorylation,  438-439 
proline  hydroxylation,  439 
proteolytic  degradation,  440-441 
sulfation,  439 

rare,  inherited  variants,  442 
structure,  435-436 
Human  leucocyte  collagenase 
amino  acid  sequence  of,  391 
catalytic  domain  of,  392 
domain  structure  of,  390 
function  of,  389-394 
specific  granules,  389 
structure  of,  390-393 
three-dimensional  structure,  392 

Immunological  recognition 

of  acetylcholine  receptor  (see  acetylcholine 
receptor)  3 1 1-326 

of  a-bungarotoxin  (see  a-bungarotoxin)  3 19-326 
of  filamentous  bacteriophage  (see  filamentous 
bacteriophage)  347—35 1 
of  oxidized  low  density  lipoproteins  (see  low 
density  lipoproteins)  327-334 

Low  density  lipoproteins  (LDL) 
antibodies  against,  328 
antioxidants  effects  on,  328 
apoB-100  in  (see  apoB-100),  328,  330-331, 
355-367 


Low  density  lipoproteins  (LDL)  (cont) 
apolipoprotein,  327 
class  A  and  class  B,  2 1 1 ,  2 1 3 
immunological  approaches  to  structure  of,  327- 
334 

immunoreactivity  of,  328-329,  330-331 
after  oxidation,  328-329,  330-331 
in  atherosclerosis,  327 
monoclonal  antibody  against,  328 
oxidation  of,  329 
receptor  of,  327 

Mass  spectroscopy,  141-183 
array  detector,  151-156 
collisional  activation,  144 
C-terminal  sequence  analysis  with,  154—155 
electron  spray  ionization  (ESI)  58,  61-62,  63, 
66,  141,  142,  151,  152 
of  disulfide  containing  proteins,  156-157 
of  proteins  from  2D  gels,  155-156 
of  PTH  amino  acid  derivatives,  61-62 
of  salt  containing  samples,  153—154 
fast  atom  bombardment  (FAB  MS)  89,  93,  143- 
144 

of  C-terminal  truncated  peptides,  93 
high  resolution,  156-157 
magnetic  sector  device,  152,  156-159 
matrix  assisted  laser  desorption  ionization  time 
of  flight  (MALDI)  17,  144,  151-152,  175-183 
of  cellular  proteins,  163 
ofHMG  proteins,  175-183 
protein  digestion  procedure  for,  163—165 
sequence  of  N-terminal  blocked  proteins 
by,  175-183 

peptide  mass  fingerprinting,  21,  161-173,  188 
sample  extraction  from  blots  for,  152 
tandem,  58,  141-150 
analysis  in,  143-144 
enzymatic  digestion  of  proteins  for,  143 
Mass  transfer  kinetics 

of  conventional  packings,  30—32 
of  macroporous  packings,  30-32 
Microdigestion  reactor  9,  176 
Microsequencing,  51-52,  163-164,  299,  445-453 
(see  N-terminal  sequence  analysis) 

Modification  of  proteins  and  peptides  (also 
see  amino  acid  modification) 
acylation,  175,  330-332 
deacetylation,  81,  82 
glycosylation,  440 
methionine  oxidation,  439 
phosphorylation,  9,  438 
proline  hydroxylation,  439 
Modified  residues 

de-novo  characterization  of,  1 1—13 
localization  within  a  polypeptide  sequence,  9-11 

a-Neurotoxins 

binding  regions  on  human  AChR  for,  3 1 2—3 15,316 
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a-Neurotoxins  {cont.) 

binding  regions  on  Torpedo  AChR  for,  312—315, 
316 

interaction  with  acetylcholine  receptor,  311—326 
N-terminal  sequence  analysis,  49-88,  163-164 
automated,  51—52,  61^6 
Edman  degradation  {see  Edman  degradation) 
Edman  reagents  for,  {see  Edman  reagents) 
from  PVDF  membranes,  163-164,  299,  445- 
453 

internal  cleavage  of  proteins  for,  8 1-86 
miniaturized,  52 

of  glycopeptides  {see  glycopeptides) 

PTH  amino  acids  {see  PTH  amino  acids) 
reagents  for  stepwise  degradation,  57-68 
Nuclear  magnetic  resonance 

application  to  protein  structure  determination, 
494-501 

heteronuclear,  493—503 
one  bond,  495 

relation  between  2D,  3D  and  4D,  496 
multidimensional,  494—50 1 
pulse  sequence  in,  494 
of  interleukin- 1  p,  498 

p53  tumor  suppressor  protein 
carboxyl  tenninus  of,  409 
domains  of,  410,  414 
oligomerization  of,  412 
overexpression  of,  407-408 
phosphorylation  of,  394,  409 
tetramerization  domain  of,  410-415 
three-dimensional  structure  of,  413 
Peptides 

characterization  by  tandem  mass  spectrometry, 
141-150 

cross-linked  to  rRNA,  275-282 
cross-linking  of,  276 
cross-linking  sites,  277 
isolation  of  peptide-oligonucleotide  heterom- 
ers,  276-277 

high  sensitivity  microsequence  analysis  of,  27— 
45 

high  speed  chromatographic  separation  of,  27- 
45 

mapping  by  2D-gel  electrophoresis,  35 
mass  spectroscopy  of,  {see  mass  spectroscopy) 
preparation  for  microsequence  analysis,  3^8 
purification  by  HPLC,  163 
rapid  mapping  of,  34—35 
reagents  for  stepwise  degradation  of,  57-68 
strategies  for  characterization  of,  187—307 
synthetic,  of  acetylcholine  receptor,  312,  313 
synthetic,  of  a-bungarotoxin,  315-324 
Phospholipase  C 

effect  on  intracellular  degradation  of  apoB-100 
362 

induction  of  protein  kinase  C,  360-361 
treatment  of  Hep  G-2  cells,  360-361 


Phosphorylation 

and  DNA  damage,  395-396 
determination  of,  with  ethanethiol,  217—225 
limitation  of  technique,  221-223 
of  ApoB- 1 00  {see  ApoB- 1 00) 
ofhsp90  398 

of  OCT- 1  POU  domain,  400-401 
ofp53  394,  409 

of  RNA  polymerase  II  CTD,  401^02 
of  SV40  T-antigen,  398 
sites  in  proteins,  217-225 
Polypeptides,  internal  cleavage 
by  CNBr,  82 

enzymatic  digestion,  143,  176,  179,  349 
in-gel  proteolysis,  20,  30 
influence  of  residues,  168 
microdigestion  reactor,  9,  176 
tryptic  digestion,  66,67 
Polyvinylidene  difluoride  (PVDF)  membrane 
application  to  C-terminal  sequencing,  109—118 
of  p-lactoglobulin,  116 
of  horse  apomyoglobin 
of  horse  heart  cytochrome  c,  114 
ofRNase,  115 
of  yeast  enolase,  113 

application  to  N-terminal  sequencing,  163-164, 
299 

Post-translational  modification 
of  apoB-100  355-367 
of  fibrinogen,  438 
Pre-electrophoretic  labeling 

of  proteins  {see  S-DABITC)  39^5 
Proline  thiohydantoin,  synthesis  of,  132 
Protection  index,  322,  324 
Protective  immunity 

by  immunization  with  a-bungarotoxin,  342 
by  immunization  with  a-bungarotoxin  peptides, 
322-324 

Protein 

aging  {see  aging  protein) 
binding  capacity  of  conventional  packings,  30-32 
binding  capacity  of  macroporous  packings,  30-32 
blocked  N-terminal,  81 

chemical  cleavage  of,  (see  Polypeptides,  inter¬ 
nal  cleavage) 
database,  457—525 

disulfide,  spontaneous  formation  of,  241-244 
-DNA  interactions,  305,  343—351 
docking  {see  protein  docking)  505—514 
endoproteolyis  of,  (see  Polypeptides,  internal 
cleavage) 

folding  {see  protein  folding) 
identification,  4—6 

modification  (see  modification  of  proteins) 
preparation  for  microsequence  analysis,  3-48 
-protein  interactions,  283—293,  343—351 
sequence  analysis  {see  sequence  analysis) 
three-dimensional  structures,  413,  457-525, 
493-503 
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Protein  docking 

basic  algorithm,  506 
hemoglobin,  318,  507,  508 
low-resolution  structures,  506 
methods,  506—507 
modified  algorithm,  506-507 
molecular  complexes,  510 

of  acetylcholine  receptor-a-bungarotoxin, 
318-319 

of  antigen-antibody,  508,  5 1 1 
of  enzyme-inhibitor,  508,  509 
of  ligand-receptor,  505 
trypsin  and  trypsin  inhibitor,  508 
Protein  folding,  457—525 

bovine  pancreatic  trypsin  inhibitor  (see  BPTI) 

cartesian  coordinates,  463 

computer  simulation,  457 

criterion  of  foldability,  458 

density  of  states,  458 

determination  by  NMR,  493-503 

diffusion  equation  method,  461—464 

docking  (see  protein  docking) 

domains,  473-48 1 

evolutionary  studies,  475 
inference  from  homology,  478^80 
limitations  of  original  super  family  concept, 
474 

first-order  transition,  459 

folding  transitions,  458 

Fourier-Poisson  integral,  463 

global  minimization,  457 

homology  domain  superfamilies,  478—480 

internal  coordinates,  463 

lattice  models,  457 

Lennard-Jones  particles,  461 

Monte  Carlo  simulations,  461 

multiple  minima  problem,  457 

of  met-enkephalin,  46 1 

of  model  polypeptides,  458 

sequence  data  organization,  477^78 

statistical  mechanical  aspects,  457,  458-^61 

superfamily,  473—481 

original  concept,  473^73 
limitations  of,  474 
revised  concept,  475—476 
thermodynamic  properties,  458 
thermodynamically-stable  native  structure,  457 
Protein  kinase 

DNA-activated,  human  (see  human  DNA-acti- 
vated 

protein  kinase)  395—406 
effect  on  secretion  of  apoB-100  359 
in  phosphorylation  of  apoB-100  356,  359,  362- 
365 

induction  by  phospholipase  C,  360-361 
tyrosine  kinase,  10 
Protein  kinases,  mixed  lineage 
basic  motif  of,  369-380 
catalytic  domain  of,  373—375 


Protein  kinases,  mixed  lineage  (cont.) 
characterization  of,  370-371 
chromosomal  localization  of,  370-371 
C-terminal  region  of,  378 
double  leucine  zipper  domain  of,  375—378 
expression  of,  370-371 
isolation  of,  370-371 
new  family  of,  369-380 
SH3  domain  in,  371-373 
structural  domains  of,  371—378 
Proteins 

acetylcholine  receptor  (see  acetylcholine  recep¬ 
tor) 

adrenodoxin  (see  adrenodoxin) 
alcohol  dehydrogenases  (see  alcohol  dehydro¬ 
genases) 

carbonic  anhydrase,  bovine,  66,  67 
cellular,  161-173 

identification  of,  by  MS  fingerprinting,  161-173 
peptide  mapping  of,  171-172,  187-193 
phosphorylation  (see  phosphorylation) 
characterization  from  1-D  or  2-D  gels,  15—26, 
187-193 

complement  pathway  components  (see 

complement  C9) 

cross-linked  to  rRNA,  276 

C-terminal  sequencing  of,  (see  C-terminal 

sequencing) 

differential  mobilities  of,  4—7 

effect  of  linear  flow  velocity  in  chromatographic 

separation  of,  32—34 

enzymatic  digestion  of,  143 

forMALDI,  163-165,  176-177,  179-183 
for  tandem  mass  spectrometry,  143 
influence  of  adjacent  residues  on,  168 
fibrinogen  (see  human  fibrinogen) 
from  acrylamide  gels,  34 

rapid  peptide-mapping  of,  34—35 
GABA^  receptor  (see  GABA^  receptor) 
high  sensitivity  microsequence  analysis  of,  27-45 
high  speed  chromatographic  separation  of,  27— 
45 

histones,  155-156 

HMG  proteins  (see  HMG  proteins) 

immunological  recognition  (see  immunological 

recognition) 

in-gel  concentration  of,  1 7—24 
insulin  B  chain,  144 

internal  cleavage  of,  for  sequencing,  8 1-86 
p-lactoglobulin,  116,  126, 

LDL  (see  low  density  lipoproteins) 
lectins,  302 

mass  spectrometry  (see  mass  spectroscopy) 
mucins,  253,  255 
multidomain,  474 

N-terminal  blocked,  sequence  analysis  of  by 
MALDI,  175-183 

N-terminal  sequencing  of,  (see  N-terminal 
sequencing) 
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Proteins  {cont.) 

ovalbumin,  135,  136,  137 
osteopontin,  223 

pre-electrophoretic  labeling  of,  39-45 
protein  kinase  {see  protein  kinase) 
ribosomal  {see  ribosomal  proteins) 
strategies  for  characterization  of,  187—307 
transducin  {see  transducin) 

PTH  amino  acids 

analysis  of  derivatives  by,  ESI-MS,  61-62 
calculation  of  corrected  yields,  73 
glycosylated,  69-80 

characterization  of,  69—80 
chromatography  of,  78 

after  (3-galactosidase  treatment,  75-77 
identification  of,  69—80 
monosaccharide  composition  analysis  of,  73, 
75,78 

separation  of,  74 

high  sensitivity  analysis  of,  49-55 
separation  of,  73 

Rapid  peptide  mapping 
examples  of,  35 

of  acrylamide  gel  resolved  proteins,  34-35 
of  2-DE-resolved  proteins,  35 
Reagents  for  stepwise  degradation 
application  of,  57-68 
Edman  reagents  {see  Edman  reagents) 
evaluation  of,  57-68 
evolution  of  design,  58-61 
perfluoroacyl  anhydride  vapor,  89 
synthesis  of,  57-68 
Ribosomal  proteins,  275-282 
cross-linking  to  rRNA,  276 
cross-linking  sites  of,  277—278 
peptide-rRNA  cross-links  within,  280-281 
protein  S7,  binding  domain  of,  278-280 
protein  SI 7,  binding  domain  of,  278-280 
Rough  endoplasmic  reticulum  (RER)  membrane, 
445^53 

phase  partitioning  with  Triton  X-1 14  446,  449 
two  dimensional  electrophoresis,  446-447,  448 
blotting  onto  PVDF  membranes,  447 
sequencing  from,  447,  450 

S-DABITC 

pre-electrophoretic  labeling  of  proteins 
with,  39^5 
method  of,  40 
of  BS A  fragments,  44 
of  p-lactoglobulin,  41-^2 
uses  of,  45 
SDS  PAGE 

of  labeled  proteins,  41 
of  cyanogen  bromide  BSA  fragments,  44 
Sequence  analysis 
ALIGN  program,  47 1 
alignments,  182,515,517-523 


Sequence  analysis  {cont.) 
alignments  {cont.) 
artifacts  of,  515 
comparative  statistics  for,  517 
comparisons  of,  515 
distances  between,  515,  520-5234 

applications  to  aspartyl  protease,  523—524 
computation  of,  520-523 
examples  of,  5 1 7-5 1 8 
gaps  in,  516,  520 
glycine  aspartic  acid  region,  519 
glycine  serine  dipeptide 
metric  measure  for,  515-525 
multiple  sequence,  478,  516-517,  518 
of  aspartyl  proteases,  517 
programs  of,  516 
progressive  strategy  for,  517 
calmodulin  superfamily,  475 
database,  5,  466-467,  473-480 
default  parameters,  468 
domains,  474-476,  478^80 
PASTA  database,  477 
gene  duplication,  474 
graphics  window,  469^70 
homology  inference,  478^80 
integration  of  windows,  470 
organization  of  sequence  data,  477-478 
PIR  international,  473-480 
protein  families,  475-478 
sequence  window,  468-469 
source  data  formats,  468 
structure  window,  469 
superfamily  classification,  473480 
thrombospondin  type  1  repeats,  2 1 1 
Vistas  program,  467468 

comparison  with  other  software,  47 1 
interfaces  to  other  programs,  470 
Sequencing  of  proteins  and  peptides 
automated,  51-52, 
with  PITC  311  61-66 

from  the  C-terminus  {see  C-terminal  sequencing) 
from  the  N-terminus  {see  N-terminal  sequencing) 
internal,  81—86 
miniaturized,  27-45,  52 
Site-directed  mutagenesis 
of  adrenodoxin,  286 

of  mitochondrial  steroid  hydroxylase  systems, 
283-293 

Steroid  hydroxylase,  mitochondrial,  283—293 
Structure  analysis 
characterization,  15 
Stylar  RNase,  429434 
fractions,  430431 

amino  acid  sequence  of  RNase  MSI  431-432 
comparison  with  solanaceae  S-RNase, 
432-433 

assignments  of  to  S-  and  non-S  RNase,  43 1 
separation  from  the  stylar  of  N.  Atlanta,  430- 
431 
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Synthetic  libraries  {see  combinatorial  libraries) 
335-342 

Three-dimensional  structure 

of  a-bungarotoxin  binding  cavity  on  AChR, 
318-319 

of  p53  protein,  413 
of  proteins  {see  protein  folding) 

T-lymphocytes 

recognition  of  a-bungarotoxin  by  (also  see 
a-bungarotoxin)  319-322 
recognition  of  a-bungarotoxin  peptides  by,  322, 
323 

Transducin 

assay  of  functionality  in  T-R*  complex,  230,  235 
binding  of  [^H]GMP-PNP  to,  229 
binding  of  [^HjGTP  to,  229 
carboxyl  groups,  derivatization  of,  237 
cross-linking  of,  230-231,  239 
electrophoresis  of,  23 1 
functional  cysteines  in,  227—250 
glycine  residues  in,  227-250 


Transducin  {cont.) 

HPLC  separation  of  tryptic  peptides  of,  235-237 
interaction  with  photoexcited  rhodopsin,  230, 
232 

isolation  of,  229 

lysyl  groups,  derivatization  of,  238—239 
sulfhydryl  groups 
labeling  of,  230,  232 
cross-linking  of,  239-241 

Vaccine,  synthetic 

against  toxin  poisoning,  3 1 1-326 
of  synthetic  a-bungarotoxin  peptides,  322  325 

X-Ray  photoelectron  spectroscopy 
high  resolution  spectra,  256—258 
in  surface  analysis,  25 1 
low  resolution  spectra,  253—256 
of  amino  acids,  251—260 
of  carbohydrates,  25 1-260 
of  mucins,  253 
of  polypeptides,  25 1-260 


